GitHub Actions est devenu le standard CI/CD pour les projets data. En entretien Senior, on evalue la capacite a automatiser le cycle de vie complet d un pipeline data.
Quelle est la difference entre un job et une step dans GitHub Actions ? Et entre on push et on pull_request ?
# Structure d un workflow GitHub Actions data
name: Data Pipeline CI
on:
pull_request:
branches: [main]
paths:
- 'dbt/**'
- 'tests/**'
push:
branches: [main]
jobs:
dbt_test: # un job = un runner independant
runs-on: ubuntu-latest
steps: # steps = etapes sequentielles dans un job
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install dbt-bigquery great-expectationsQu est-ce que le slim CI dbt ? Pourquoi est-il important ?
jobs:
dbt_ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: dbt deps
run: dbt deps
# Slim CI : ne tester que les modeles modifies
- name: dbt build (slim)
run: |
dbt build \
--select state:modified+ \
--defer \
--state ./prod_artifacts \
--target ci
env:
DBT_PROFILES_DIR: .
BIGQUERY_KEYFILE: ${{ secrets.GCP_SA_KEY }}
# Upload les artefacts pour le prochain run
- uses: actions/upload-artifact@v4
with:
name: dbt-artifacts
path: target/Comment integrez-vous Great Expectations dans un pipeline CI ?
- name: Great Expectations validation
run: |
great_expectations checkpoint run my_checkpoint
env:
GE_CLOUD_ACCESS_TOKEN: ${{ secrets.GE_TOKEN }}
- name: Commentaire PR avec resultats
uses: actions/github-script@v7
if: failure()
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
body: '❌ Data quality checks failed. See logs for details.'
})Comment buildez-vous et publiez-vous une image Docker d API ML dans un pipeline CI ?
build-push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: europe-west1-docker.pkg.dev
username: _json_key
password: ${{ secrets.GCP_SA_KEY }}
- uses: docker/build-push-action@v5
with:
push: ${{ github.ref == 'refs/heads/main' }}
tags: europe-west1-docker.pkg.dev/mon-projet/api/scoring:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=maxComment gerez-vous les credentials dans GitHub Actions sans les exposer ?
jobs:
test:
strategy:
matrix:
python-version: ['3.10', '3.11', '3.12']
dbt-adapter: ['dbt-bigquery', 'dbt-snowflake']
runs-on: ubuntu-latest
steps:
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install ${{ matrix.dbt-adapter }}| Niveau | Maitrise | Signal GO | NO-GO |
|---|---|---|---|
| Confirme | Workflows basiques, tests automatises, secrets | A configure un workflow dbt CI avec tests, gere les secrets | Lance les tests manuellement, ne sait pas ce qu est un workflow |
| Senior | Slim CI dbt, OIDC, matrix, deploiement Docker | A mis en place le slim CI dbt, utilise OIDC au lieu des cles JSON | Stocke les cles GCP en clair, ne connait pas le slim CI |
Premier entretien gratuit. Rapport GO/NO-GO sous 48h.