Accueil›Blog›Test technique MLflow avancé : projets, recipes, Model Registry, serving

Guide recrutement data

Test technique MLflow avancé : projets, recipes, Model Registry, serving

MLflow est bien plus qu un simple outil de tracking. En entretien ML Engineer Senior, on évalue la maîtrise de MLflow Projects pour la reproductibilité, du Model Registry pour la gouvernance et du serving.

Data Builder·Juin 2025·6 min de lecture·Data Scientist · ML Engineer

Sommaire

Tracking avancé
MLflow Projects
Model Registry avancé
MLflow Serving
CI/CD avec MLflow
MLflow sur Databricks
Grille

1Tracking avancé : tout loguer

Question discriminante

Qu est-ce que vous loguez systématiquement dans MLflow au-delà des métriques de base ?

import mlflow
import mlflow.sklearn
from mlflow.models import infer_signature

with mlflow.start_run(run_name='xgb_v3_features_engineered') as run:
    # Paramètres
    mlflow.log_params({
        'n_estimators': 500,
        'max_depth': 6,
        'learning_rate': 0.05,
        'feature_set': 'v3_with_lags'  # quelle version des features
    })
    
    # Métriques
    mlflow.log_metrics({
        'auc_test': 0.87,
        'auc_train': 0.93,
        'precision': 0.82,
        'recall': 0.79,
        'f1': 0.80
    })
    
    # Artefacts : tout ce qui permet de reproduire
    mlflow.log_artifact('feature_importance.png')
    mlflow.log_artifact('confusion_matrix.png')
    mlflow.log_artifact('data_stats.json')   # stats des données d entrainement
    
    # Signature : inputs/outputs attendus
    signature = infer_signature(X_train, model.predict_proba(X_train))
    mlflow.sklearn.log_model(model, 'model', signature=signature)

2MLflow Projects : reproductibilité

Question discriminante

Comment garantissez-vous que votre code d entrainement est reproductible avec MLflow Projects ?

# MLproject file - définit comment exécuter le projet
name: churn_prediction

conda_env: conda.yaml  # ou pip_requirements: requirements.txt

entry_points:
  train:
    parameters:
      n_estimators: {type: int, default: 100}
      max_depth: {type: int, default: 5}
      data_path: {type: str}
    command: 'python train.py --n-estimators {n_estimators} --max-depth {max_depth} --data {data_path}'

  evaluate:
    parameters:
      model_uri: {type: str}
    command: 'python evaluate.py --model {model_uri}'

# Lancer depuis n importe où
mlflow run . -P n_estimators=500 -P max_depth=6

# Ou depuis Git directement
mlflow run https://github.com/org/churn-model -P n_estimators=500

3Model Registry : workflow complet

Question discriminante

Décrivez le workflow complet de promotion d un modèle du développement à la production.

from mlflow.tracking import MlflowClient

client = MlflowClient()

# 1. Enregistrer le modèle depuis un run
model_uri = f'runs:/{run_id}/model'
model_details = mlflow.register_model(model_uri, 'churn_predictor')

# 2. Ajouter des métadonnées
client.update_model_version(
    name='churn_predictor',
    version=model_details.version,
    description='XGBoost v3, AUC=0.87, trained on 2025-01 data'
)

# 3. Promouvoir vers Staging après validation
client.transition_model_version_stage(
    name='churn_predictor',
    version=model_details.version,
    stage='Staging',
    archive_existing_versions=False
)

# 4. Tests d intégration sur Staging...

# 5. Promouvoir en Production
client.transition_model_version_stage(
    name='churn_predictor',
    version=model_details.version,
    stage='Production',
    archive_existing_versions=True  # archive l ancien modèle
)

4MLflow Serving : exposer un modèle

Question discriminante

Comment servez-vous un modèle MLflow via une API REST ?

## Serving via CLI
# Charger depuis le registry
mlflow models serve \
  --model-uri models:/churn_predictor/Production \
  --port 5000 \
  --no-conda

## Serving dans Python (pour les custom handlers)
from mlflow.pyfunc import PythonModel

class ChurnPredictor(PythonModel):
    def load_context(self, context):
        import joblib
        self.model = joblib.load(context.artifacts['model_path'])
        self.threshold = 0.6  # seuil business
    
    def predict(self, context, model_input):
        probas = self.model.predict_proba(model_input)[:, 1]
        return pd.DataFrame({
            'probability': probas,
            'prediction': (probas > self.threshold).astype(int)
        })

## Requête vers l API servie
import requests
response = requests.post(
    'http://localhost:5000/invocations',
    json={'dataframe_records': X_test.to_dict('records')}
)

5CI/CD avec MLflow : automatiser les promotions

Question discriminante

Comment intégrez-vous MLflow dans un pipeline CI/CD pour le ML ?

CI sur PR — lancer l entrainement sur un sample, vérifier que les métriques sont au-dessus du baseline
Comparaison automatique — comparer le nouveau modèle avec le modèle en Production. Promouvoir seulement si meilleur
GitHub Actions — workflow qui déclenche l entrainement, enregistre dans MLflow, promeut si métriques OK
Model approval gate — certains modèles critiques nécessitent une validation humaine avant la promotion Production

6MLflow sur Databricks : différences

Question discriminante

En quoi MLflow Managed sur Databricks diffère-t-il du MLflow open source ?

Tracking server centralisé — pas de configuration, intégré au workspace Databricks
Unity Catalog integration — dans Databricks 13+, le Model Registry est dans Unity Catalog. Lineage, accès RBAC
Auto logging — mlflow.autolog() capture automatiquement sklearn, XGBoost, PyTorch, LightGBM
Feature Store — intégration native entre les features Databricks Feature Store et les modèles MLflow

7Grille par niveau

Niveau	Maitrise	Signal GO	NO-GO
Confirmé	Tracking complet, Model Registry, serving basique	Loggue paramètres + métriques + artefacts, a promu un modèle via Registry	Ne loggue que les métriques sans les paramètres ni les artefacts
Senior	MLflow Projects, serving production, CI/CD ML, custom PythonModel	A écrit un MLproject, serve un modèle custom en production, a un CI/CD ML	Ne sait pas ce qu est MLflow Projects

Vous recrutez un ML Engineer ?

Premier entretien gratuit. Rapport GO/NO-GO sous 48h.

Tester gratuitement Reserver un appel