Au-delà des bases Kubernetes, les Data Engineers déployant Airflow, Spark ou des APIs ML en production ont besoin de Helm, des StatefulSets et de GitOps. Voici ce qu on évalue au niveau Senior.
1Helm : déployer des applications complexes
Question discriminante
Qu est-ce que Helm ? Comment l utilisez-vous pour déployer Airflow sur Kubernetes ?
# Helm : gestionnaire de packages Kubernetes
# Chart = package K8s réutilisable
# Ajouter le repo Airflow
helm repo add apache-airflow https://airflow.apache.org
helm repo update
# Installer Airflow avec des valeurs personnalisées
helm install airflow apache-airflow/airflow \
--namespace airflow \
--create-namespace \
--values airflow-values.yaml
# airflow-values.yaml
executor: KubernetesExecutor
env:
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'false'
persistence:
enabled: true
dags:
persistence:
enabled: true
storageClassName: standard
gitSync:
enabled: true
repo: https://github.com/org/dags-repo
branch: main
- Helm Chart — ensemble de manifests K8s paramétrables. Comme un package npm pour Kubernetes
- values.yaml — surcharger les valeurs par défaut du chart. Versionner dans Git
- Helm upgrade — mettre à jour une release. Helm rollback en cas de problème
2StatefulSets : workloads avec état persistant
Question discriminante
Quand utilisez-vous un StatefulSet plutôt qu un Deployment dans un contexte data ?
# StatefulSet : pour les workloads qui ont besoin
# d identité stable et de stockage persistant
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
spec:
serviceName: kafka-headless
replicas: 3
template:
spec:
containers:
- name: kafka
image: confluentinc/cp-kafka:7.5
volumeMounts:
- name: data
mountPath: /var/kafka-data
volumeClaimTemplates: # PVC créé automatiquement par pod
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 50Gi
- StatefulSet vs Deployment — Deployment : pods interchangeables. StatefulSet : pods avec identité stable (kafka-0, kafka-1, kafka-2)
- Cas d usage data — Kafka, ZooKeeper, Elasticsearch, bases de données en cluster
- PVC par pod — chaque pod a son propre Persistent Volume Claim. Données persistées même si le pod est supprimé
3Secrets management sécurisé
Question discriminante
Comment gérez-vous les credentials (clés API, mots de passe) dans un cluster Kubernetes ?
- Kubernetes Secrets natifs — encodés en base64, pas chiffrés. Insuffisant pour la production
- External Secrets Operator — synchronise automatiquement les secrets depuis AWS Secrets Manager, GCP Secret Manager, Azure Key Vault vers K8s
- Sealed Secrets (Bitnami) — chiffrer les secrets K8s pour les versionner en Git de manière sécurisée
- Workload Identity — le pod s authentifie directement sur GCP/AWS sans stocker de credentials. Meilleure pratique 2025
- Ne jamais — mettre des secrets dans les images Docker ou dans les variables d environnement en clair
4GitOps avec ArgoCD
Question discriminante
Qu est-ce que GitOps ? Comment ArgoCD synchronise-t-il un cluster Kubernetes ?
# Application ArgoCD
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: airflow-prod
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/k8s-infra
targetRevision: main
path: apps/airflow/prod
destination:
server: https://kubernetes.default.svc
namespace: airflow
syncPolicy:
automated:
prune: true # supprimer les ressources retirées du repo
selfHeal: true # resynchroniser si quelqu un modifie K8s manuellement
- GitOps — Git est la source de vérité pour l état du cluster. Tout changement passe par une PR
- ArgoCD — détecte les divergences entre Git et le cluster, synchronise automatiquement
- Avantages — auditabilité (tout changement est un commit), rollback simple (revert le commit), cohérence entre environnements
5Airflow sur Kubernetes : KubernetesExecutor
Question discriminante
Quelle est la différence entre CeleryExecutor et KubernetesExecutor pour Airflow ?
| CeleryExecutor | KubernetesExecutor |
|---|
| Workers | Pool de workers permanents | Un Pod par tâche (créé/détruit) |
| Isolation | Faible (workers partagés) | Maximale (pod dédié) |
| Scalabilité | Fixe ou autoscaling Celery | Automatique K8s |
| Coût | Workers toujours allumés | Paye seulement pendant l exécution |
| Complexité | Nécessite Redis/RabbitMQ | K8s natif, plus simple |
6Monitoring des workloads data K8s
Question discriminante
Comment monitorez-vous vos pipelines data déployés sur Kubernetes ?
- Prometheus + Grafana — métriques CPU/RAM par pod, alertes sur les OOM kills ou les crashloops
- Loki — agrégation des logs de tous les pods (Airflow, Spark, APIs ML). Stack Grafana complète
- kube-state-metrics — métriques sur l état des ressources K8s (Deployments, StatefulSets, Jobs)
- Alertes critiques data — pod en CrashLoopBackOff, job Spark OOMKilled, latence API ML > seuil, consumer Kafka en retard
# K8s CronJob pour pipeline data
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-etl
namespace: data-platform
spec:
schedule: "0 6 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: data-pipeline-sa
containers:
- name: etl
image: europe-west1-docker.pkg.dev/projet/data/pipeline:1.2.3
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
resources:
requests: {cpu: "500m", memory: "1Gi"}
limits: {cpu: "2", memory: "4Gi"}
restartPolicy: OnFailure
backoffLimit: 3
- KubernetesExecutor Airflow - chaque task Airflow = pod ephemere. Isolation totale, scaling automatique, pas de workers permanents. Standard pour les deployments K8s prod
- Resource limits obligatoires - sans limits, un job peut consommer toute la memoire du node et tuer les autres pods. Toujours definir requests ET limits
- Workload Identity - lier service account K8s a un service account GCP/AWS. Zero credential dans les pods, authentification via identite du pod
- Horizontal Pod Autoscaling - pour les APIs de scoring ML : scaler selon CPU/memoire ou metriques custom (requetes en attente dans la queue)
- Spark on Kubernetes - spark-submit vers cluster K8s. Les executors sont des pods ephemeres. GKE Autopilot simplifie le provisioning
- KubernetesExecutor Airflow - chaque task Airflow = pod ephemere. Isolation totale, scaling automatique, pas de workers permanents. Standard pour les deployments K8s prod
- Resource limits obligatoires - sans limits, un job peut consommer toute la memoire du node et tuer les autres pods. Toujours definir requests ET limits
- Workload Identity - lier service account K8s a un service account GCP/AWS. Zero credential dans les pods, authentification via identite du pod
- Horizontal Pod Autoscaling - pour les APIs de scoring ML : scaler selon CPU/memoire ou metriques custom (requetes en attente dans la queue)
- Spark on Kubernetes - spark-submit vers cluster K8s. Les executors sont des pods ephemeres. GKE Autopilot simplifie le provisioning
7Grille par niveau
| Niveau | Maitrise | Signal GO | NO-GO |
|---|
| Confirmé | Helm basique, déploiements K8s, secrets management | A déployé Airflow avec Helm, utilise External Secrets ou Workload Identity | Stocke des credentials dans les manifests K8s |
| Senior | StatefulSets, GitOps ArgoCD, KubernetesExecutor, monitoring | A mis en place GitOps avec ArgoCD, a déployé Airflow avec KubernetesExecutor | Ne sait pas ce qu est GitOps, ne connaît pas ArgoCD |
1Helm: deploying complex applications
Discriminating question
What is Helm? How do you use it to deploy Airflow on Kubernetes?
# Helm: Kubernetes package manager
# Chart = reusable K8s package
# Add the Airflow repo
helm repo add apache-airflow https://airflow.apache.org
helm repo update
# Install Airflow with custom values
helm install airflow apache-airflow/airflow \
--namespace airflow \
--create-namespace \
--values airflow-values.yaml
# airflow-values.yaml
executor: KubernetesExecutor
env:
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'false'
persistence:
enabled: true
dags:
persistence:
enabled: true
storageClassName: standard
gitSync:
enabled: true
repo: https://github.com/org/dags-repo
branch: main
- Helm Chart — set of parameterizable K8s manifests. Like an npm package for Kubernetes
- values.yaml — override the chart's default values. Version in Git
- Helm upgrade — update a release. Helm rollback in case of issues
2StatefulSets: workloads with persistent state
Discriminating question
When do you use a StatefulSet instead of a Deployment in a data context?
# StatefulSet: for workloads that need
# stable identity and persistent storage
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
spec:
serviceName: kafka-headless
replicas: 3
template:
spec:
containers:
- name: kafka
image: confluentinc/cp-kafka:7.5
volumeMounts:
- name: data
mountPath: /var/kafka-data
volumeClaimTemplates: # PVC automatically created per pod
- metadata:
name: data
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 50Gi
- StatefulSet vs Deployment — Deployment: interchangeable pods. StatefulSet: pods with stable identity (kafka-0, kafka-1, kafka-2)
- Data use cases — Kafka, ZooKeeper, Elasticsearch, clustered databases
- PVC per pod — each pod has its own Persistent Volume Claim. Data persisted even if the pod is deleted
3Secure secrets management
Discriminating question
How do you manage credentials (API keys, passwords) in a Kubernetes cluster?
- Native Kubernetes Secrets — base64-encoded, not encrypted. Insufficient for production
- External Secrets Operator — automatically syncs secrets from AWS Secrets Manager, GCP Secret Manager, Azure Key Vault to K8s
- Sealed Secrets (Bitnami) — encrypt K8s secrets to version them in Git securely
- Workload Identity — the pod authenticates directly to GCP/AWS without storing credentials. Best practice 2025
- Never — put secrets in Docker images or in plaintext environment variables
4GitOps with ArgoCD
Discriminating question
What is GitOps? How does ArgoCD synchronize a Kubernetes cluster?
# ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: airflow-prod
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/k8s-infra
targetRevision: main
path: apps/airflow/prod
destination:
server: https://kubernetes.default.svc
namespace: airflow
syncPolicy:
automated:
prune: true # delete resources removed from the repo
selfHeal: true # resync if someone manually modifies K8s
- GitOps — Git is the source of truth for the cluster state. All changes go through a PR
- ArgoCD — detects divergences between Git and the cluster, synchronizes automatically
- Advantages — auditability (every change is a commit), simple rollback (revert the commit), consistency across environments
5Airflow on Kubernetes: KubernetesExecutor
Discriminating question
What is the difference between CeleryExecutor and KubernetesExecutor for Airflow?
| CeleryExecutor | KubernetesExecutor |
|---|
| Workers | Permanent worker pool | One Pod per task (created/destroyed) |
| Isolation | Low (shared workers) | Maximum (dedicated pod) |
| Scalability | Fixed or Celery autoscaling | Automatic K8s |
| Cost | Workers always running | Pay only during execution |
| Complexity | Requires Redis/RabbitMQ | K8s native, simpler |
6Monitoring data workloads on K8s
Discriminating question
How do you monitor your data pipelines deployed on Kubernetes?
- Prometheus + Grafana — CPU/RAM metrics per pod, alerts on OOM kills or crashloops
- Loki — log aggregation from all pods (Airflow, Spark, ML APIs). Full Grafana stack
- kube-state-metrics — metrics on the state of K8s resources (Deployments, StatefulSets, Jobs)
- Critical data alerts — pod in CrashLoopBackOff, Spark job OOMKilled, ML API latency above threshold, Kafka consumer lagging
# K8s CronJob for data pipeline
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-etl
namespace: data-platform
spec:
schedule: "0 6 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: data-pipeline-sa
containers:
- name: etl
image: europe-west1-docker.pkg.dev/projet/data/pipeline:1.2.3
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
resources:
requests: {cpu: "500m", memory: "1Gi"}
limits: {cpu: "2", memory: "4Gi"}
restartPolicy: OnFailure
backoffLimit: 3
- KubernetesExecutor Airflow - each Airflow task = ephemeral pod. Total isolation, automatic scaling, no permanent workers. Standard for prod K8s deployments
- Mandatory resource limits - without limits, a job can consume all the node's memory and kill other pods. Always define requests AND limits
- Workload Identity - bind K8s service account to a GCP/AWS service account. Zero credentials in pods, authentication via pod identity
- Horizontal Pod Autoscaling - for ML scoring APIs: scale based on CPU/memory or custom metrics (requests waiting in the queue)
- Spark on Kubernetes - spark-submit to K8s cluster. Executors are ephemeral pods. GKE Autopilot simplifies provisioning
- KubernetesExecutor Airflow - each Airflow task = ephemeral pod. Total isolation, automatic scaling, no permanent workers. Standard for prod K8s deployments
- Mandatory resource limits - without limits, a job can consume all the node's memory and kill other pods. Always define requests AND limits
- Workload Identity - bind K8s service account to a GCP/AWS service account. Zero credentials in pods, authentication via pod identity
- Horizontal Pod Autoscaling - for ML scoring APIs: scale based on CPU/memory or custom metrics (requests waiting in the queue)
- Spark on Kubernetes - spark-submit to K8s cluster. Executors are ephemeral pods. GKE Autopilot simplifies provisioning
7Level grid
| Level | Mastery | GO signal | NO-GO |
|---|
| Confirmed | Basic Helm, K8s deployments, secrets management | Has deployed Airflow with Helm, uses External Secrets or Workload Identity | Stores credentials in K8s manifests |
| Senior | StatefulSets, GitOps ArgoCD, KubernetesExecutor, monitoring | Has set up GitOps with ArgoCD, has deployed Airflow with KubernetesExecutor | Does not know what GitOps is, unfamiliar with ArgoCD |