GitHub Actions est devenu le standard CI/CD pour les projets data. En entretien Senior, on evalue la capacite a automatiser le cycle de vie complet d un pipeline data.
Quelle est la difference entre un job et une step dans GitHub Actions ? Et entre on push et on pull_request ?
# Structure d un workflow GitHub Actions data
name: Data Pipeline CI
on:
pull_request:
branches: [main]
paths:
- 'dbt/**'
- 'tests/**'
push:
branches: [main]
jobs:
dbt_test: # un job = un runner independant
runs-on: ubuntu-latest
steps: # steps = etapes sequentielles dans un job
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install dbt-bigquery great-expectationsQu est-ce que le slim CI dbt ? Pourquoi est-il important ?
jobs:
dbt_ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: dbt deps
run: dbt deps
# Slim CI : ne tester que les modeles modifies
- name: dbt build (slim)
run: |
dbt build \
--select state:modified+ \
--defer \
--state ./prod_artifacts \
--target ci
env:
DBT_PROFILES_DIR: .
BIGQUERY_KEYFILE: ${{ secrets.GCP_SA_KEY }}
# Upload les artefacts pour le prochain run
- uses: actions/upload-artifact@v4
with:
name: dbt-artifacts
path: target/Comment integrez-vous Great Expectations dans un pipeline CI ?
- name: Great Expectations validation
run: |
great_expectations checkpoint run my_checkpoint
env:
GE_CLOUD_ACCESS_TOKEN: ${{ secrets.GE_TOKEN }}
- name: Commentaire PR avec resultats
uses: actions/github-script@v7
if: failure()
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
body: '❌ Data quality checks failed. See logs for details.'
})Comment buildez-vous et publiez-vous une image Docker d API ML dans un pipeline CI ?
build-push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: europe-west1-docker.pkg.dev
username: _json_key
password: ${{ secrets.GCP_SA_KEY }}
- uses: docker/build-push-action@v5
with:
push: ${{ github.ref == 'refs/heads/main' }}
tags: europe-west1-docker.pkg.dev/mon-projet/api/scoring:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=maxComment gerez-vous les credentials dans GitHub Actions sans les exposer ?
jobs:
test:
strategy:
matrix:
python-version: ['3.10', '3.11', '3.12']
dbt-adapter: ['dbt-bigquery', 'dbt-snowflake']
runs-on: ubuntu-latest
steps:
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- run: pip install ${{ matrix.dbt-adapter }}# Workflow complet: lint + test + deploy
name: Data Pipeline CI
on: [push, pull_request]
jobs:
quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Python lint
run: |
pip install ruff
ruff check src/
- name: Unit tests
run: pytest tests/ -v --cov=src
- name: Integration test (staging)
if: github.ref == 'refs/heads/main'
run: python src/pipeline.py --env staging
env:
DB_PASSWORD: ${{ secrets.STAGING_DB_PASSWORD }}
- name: Deploy prod
if: github.ref == 'refs/heads/main' && success()
run: python deploy.py --env production
env:
DB_PASSWORD: ${{ secrets.PROD_DB_PASSWORD }}# .github/workflows/data_ci.yml
name: Data Pipeline CI/CD
on:
push: {branches: [main]}
pull_request: {branches: [main]}
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with: {python-version: '3.11'}
- uses: actions/cache@v3
with:
path: ~/.cache/pip
key: pip-${{ hashFiles('requirements.txt') }}
- run: pip install -r requirements.txt
- run: ruff check src/
- run: pytest tests/ --cov=src --cov-report=xml
dbt-check:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install dbt-snowflake
- run: dbt deps --project-dir dbt/
- run: dbt compile --project-dir dbt/ --profiles-dir dbt/
- run: dbt test --select state:modified+ --defer --state ./prod-artifacts
env:
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
deploy:
needs: dbt-check
if: github.ref == 'refs/heads/main'
environment: production
runs-on: ubuntu-latest
steps:
- run: python deploy.py --env production
env:
DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }}| Niveau | Maitrise | Signal GO | NO-GO |
|---|---|---|---|
| Confirme | Workflows basiques, tests automatises, secrets | A configure un workflow dbt CI avec tests, gere les secrets | Lance les tests manuellement, ne sait pas ce qu est un workflow |
| Senior | Slim CI dbt, OIDC, matrix, deploiement Docker | A mis en place le slim CI dbt, utilise OIDC au lieu des cles JSON | Stocke les cles GCP en clair, ne connait pas le slim CI |
Premier entretien gratuit. Rapport GO/NO-GO sous 48h.