8 modeles sur 10 qui passent la phase pilote ne survivent pas un an en production. Ce REX concret montre pourquoi et comment le MLOps resout ces problemes avec Dataiku dans un contexte grand groupe.
Dans un grand groupe comme LVMH, maintenir des modeles ML en production sur des dizaines de maisons est un defi d organisation autant que de technique. Ce guide part d un cas concret.
1Le probleme : 8/10 modeles n survivent pas un an
Question discriminante
Pourquoi la majorite des modeles ML deployés ne survivent-ils pas un an en production ? Quelles sont les causes racines ?
- Absence de responsable — personne n est designe pour surveiller et redemarrer le pipeline quand il echoue
- Incomprehension — les nouveaux membres de l equipe ne comprennent pas les modeles existants, aucune documentation
- Absence de processus de livraison — erreurs frequentes dans les outputs, pas de validation automatique
- Rigidite face aux changements — le modele ne s adapte pas quand les donnees ou le contexte metier changent
2Les 4 phases du cycle de vie MLOps
| Phase | Activite | Output |
|---|
| 1. Identification | Definition claire du probleme metier | Business case valide, KPIs definis |
| 2. Developpement | Creation et entrainement du modele ML | Modele valide sur dataset de test |
| 3. Phase pilote | Test en conditions reelles, validation de la performance | Validation metier, seuils de performance etablis |
| 4. Production | Integration, monitoring continu, amelioration | Modele monitore, reentraine si derive detectee |
3Le framework CI/CD ML : fiabilite, scalabilite, tracabilite
Question discriminante
Qu entendez-vous par CI/CD pour le ML ? En quoi est-ce different du CI/CD software classique ?
- Fiabilite — monitoring des donnees (distribution des features), monitoring de la performance du modele, monitoring des outputs, retraining automatique
- Scalabilite — pipeline orchestre reproductible, documentation technique et metier, onboarding rapide pour les nouveaux
- Tracabilite — stocker les outputs historiques, journaliser les runs, versionner les modeles, permettre l audit
- Difference avec CI/CD software — en ML, tester le comportement du modele est probabiliste (pas binaire). Un modele peut etre correctement deploye mais produire de mauvais resultats si les donnees ont derive
4MLOps avec Dataiku : ce que la plateforme apporte
Architecture MLOps LVMH avec Dataiku :
[Donnees] -> [Scenario Dataiku]
|
Orchestration automatique :
- Trigger : temporel, changement dataset, manuel
- Reporter : Teams, email sur succes/echec
|
Metrics & checks (30% du temps) :
- Data quality (schema, distributions)
- Model performance en validation
- Model performance en production
- Distribution des outputs (avant/apres post-processing)
|
Refactoring code (60% du temps) :
- Visual recipes -> SQL -> Python (performance)
- SQL Pipelines (traitement natif en base)
- Optimisation du stockage (zones)
|
Visualisation (5%) :
- Dashboard de suivi dans Dataiku
- Experiment Tracking — historique des runs avec parametres, performances, volume sur dataset de validation
- Model Store — modeles valides historiques avec leurs metriques et seuils utilises
- Scenario — equivalent d un DAG Airflow dans Dataiku, avec triggers, reporters et etapes
560% du temps : le refactoring est roi
Question discriminante
Dans un projet MLOps, combien de temps passez-vous a la modelisation vs au pipeline et au refactoring ?
- 60% refactoring — optimisation du code pour la performance et la maintenabilite : passer des Visual Recipes Dataiku a SQL, puis a des SQL Pipelines natifs
- 30% monitoring — mettre en place tous les checks : qualite des donnees, performance du modele, distribution des outputs
- 5% visualisation — dashboard de suivi pour les stakeholders
- 5% orchestration — configurer les triggers et les reporters
- Conclusion — un MLOps 80% de machine learning et 20% d ingenierie est deja un projet en difficulte
6Types de modeles en production chez LVMH
- Client Development — targeting des clients selon leur comportement (repeaters, one-timers)
- Product Recommendation — recommandation de produits croisee entre les maisons du groupe
- Sales Forecast — prevision des ventes par maison, zone, canal
- Efficiency Models — optimisation des stocks, reduction des invendus
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
import mlflow
class MLOpsPipeline:
def train_and_log(self, X_train, y_train, X_val, y_val):
with mlflow.start_run() as run:
mlflow.sklearn.autolog()
model = XGBClassifier(n_estimators=300, max_depth=6)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
mlflow.log_metric('auc', roc_auc_score(y_val, model.predict_proba(X_val)[:,1]))
return run.info.run_id
def check_drift(self, reference, production):
report = Report(metrics=[DataDriftPreset(), TargetDriftPreset()])
report.run(reference_data=reference, current_data=production)
result = report.as_dict()
if result['metrics'][0]['result']['dataset_drift']:
self.trigger_retraining()
return result
def trigger_retraining(self):
requests.post(f'{AIRFLOW_URL}/api/v1/dags/retrain_churn/dagRuns',
json={'conf': {'triggered_by': 'drift_detection'}},
auth=('airflow', AIRFLOW_PASSWORD))
- Retraining triggers - drift detecte (Evidently), degradation metrique (AUC -5%), schedule calendaire (mensuel), ou volume seuil (100k nouveaux exemples)
- Champion/Challenger pattern - le nouveau modele route 10% du trafic. Si Challenger surperforme apres N jours, promotion automatique
- Feature Store - Feast, Databricks Feature Store, Vertex. Centralise les features entre training et serving. Elimine le training/serving skew
- Model cards - documenter les performances par segment (age, region, produit). Obligatoire pour les modeles a impact reglementaire
- Rollback immediat - le Model Registry MLflow permet de revenir en 1 commande a la version precedente. Tester le rollback regulierement en staging
7Grille par niveau
| Niveau | Maitrise | Signal GO | NO-GO |
|---|
| Confirme | A deploye un modele, connait MLflow, fait des checks basiques | A deploye en production, surveille la performance, reconnait les causes d obsolescence | Pense que deployer = mettre le notebook en prod |
| Senior | Framework MLOps complet, monitoring derive, CI/CD ML, Dataiku ou MLflow avance | A mis en place un monitoring complet, a configure un retraining automatique | N a pas de monitoring sur ses modeles en production |
| Lead | Architecture MLOps organisation, gouvernance des modeles, choix de plateformes | A defini le framework MLOps de son organisation, a choisi et deploye la plateforme | Ne peut pas expliquer pourquoi ses modeles meurent en production |
In a large group like LVMH, maintaining ML models in production across dozens of houses is as much an organizational challenge as a technical one. This guide starts from a concrete case.
1The problem: 8/10 models don't survive one year
Discriminating question
Why do the majority of deployed ML models not survive one year in production? What are the root causes?
- No owner — no one is designated to monitor and restart the pipeline when it fails
- Lack of understanding — new team members don't understand existing models, no documentation
- No delivery process — frequent errors in outputs, no automated validation
- Rigidity in the face of change — the model doesn't adapt when data or business context changes
2The 4 phases of the MLOps lifecycle
| Phase | Activity | Output |
|---|
| 1. Identification | Clear definition of the business problem | Validated business case, defined KPIs |
| 2. Development | Creation and training of the ML model | Model validated on test dataset |
| 3. Pilot phase | Testing under real conditions, performance validation | Business validation, established performance thresholds |
| 4. Production | Integration, continuous monitoring, improvement | Monitored model, retrained if drift detected |
3The CI/CD ML framework: reliability, scalability, traceability
Discriminating question
What do you mean by CI/CD for ML? How is it different from classic software CI/CD?
- Reliability — data monitoring (feature distribution), model performance monitoring, output monitoring, automatic retraining
- Scalability — reproducible orchestrated pipeline, technical and business documentation, fast onboarding for new members
- Traceability — store historical outputs, log runs, version models, enable auditing
- Difference from software CI/CD — in ML, testing model behavior is probabilistic (not binary). A model can be correctly deployed but produce bad results if data has drifted
4MLOps with Dataiku: what the platform brings
MLOps Architecture LVMH with Dataiku:
[Data] -> [Dataiku Scenario]
|
Automatic orchestration:
- Trigger: time-based, dataset change, manual
- Reporter: Teams, email on success/failure
|
Metrics & checks (30% of time):
- Data quality (schema, distributions)
- Model performance in validation
- Model performance in production
- Output distribution (before/after post-processing)
|
Code refactoring (60% of time):
- Visual recipes -> SQL -> Python (performance)
- SQL Pipelines (native in-database processing)
- Storage optimization (zones)
|
Visualization (5%):
- Monitoring dashboard in Dataiku
- Experiment Tracking — run history with parameters, performance, volume on validation dataset
- Model Store — historical validated models with their metrics and thresholds used
- Scenario — equivalent of an Airflow DAG in Dataiku, with triggers, reporters and steps
560% of the time: refactoring is king
Discriminating question
In an MLOps project, how much time do you spend on modeling vs. pipeline and refactoring?
- 60% refactoring — code optimization for performance and maintainability: moving from Dataiku Visual Recipes to SQL, then to native SQL Pipelines
- 30% monitoring — setting up all checks: data quality, model performance, output distribution
- 5% visualization — monitoring dashboard for stakeholders
- 5% orchestration — configuring triggers and reporters
- Conclusion — an MLOps project that is 80% machine learning and 20% engineering is already a project in trouble
6Types of models in production at LVMH
- Client Development — targeting clients based on their behavior (repeaters, one-timers)
- Product Recommendation — cross-product recommendation across the group's houses
- Sales Forecast — sales forecasting by house, zone, channel
- Efficiency Models — inventory optimization, reduction of unsold items
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
import mlflow
class MLOpsPipeline:
def train_and_log(self, X_train, y_train, X_val, y_val):
with mlflow.start_run() as run:
mlflow.sklearn.autolog()
model = XGBClassifier(n_estimators=300, max_depth=6)
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
mlflow.log_metric('auc', roc_auc_score(y_val, model.predict_proba(X_val)[:,1]))
return run.info.run_id
def check_drift(self, reference, production):
report = Report(metrics=[DataDriftPreset(), TargetDriftPreset()])
report.run(reference_data=reference, current_data=production)
result = report.as_dict()
if result['metrics'][0]['result']['dataset_drift']:
self.trigger_retraining()
return result
def trigger_retraining(self):
requests.post(f'{AIRFLOW_URL}/api/v1/dags/retrain_churn/dagRuns',
json={'conf': {'triggered_by': 'drift_detection'}},
auth=('airflow', AIRFLOW_PASSWORD))
- Retraining triggers - drift detected (Evidently), metric degradation (AUC -5%), calendar schedule (monthly), or volume threshold (100k new examples)
- Champion/Challenger pattern - the new model routes 10% of traffic. If Challenger outperforms after N days, automatic promotion
- Feature Store - Feast, Databricks Feature Store, Vertex. Centralizes features between training and serving. Eliminates training/serving skew
- Model cards - document performance by segment (age, region, product). Mandatory for models with regulatory impact
- Immediate rollback - the MLflow Model Registry allows reverting to the previous version in 1 command. Test rollback regularly in staging
7Level grid
| Level | Mastery | GO signal | NO-GO |
|---|
| Confirmed | Has deployed a model, knows MLflow, does basic checks | Has deployed to production, monitors performance, recognizes causes of obsolescence | Thinks that deploying = putting the notebook in prod |
| Senior | Complete MLOps framework, drift monitoring, CI/CD ML, Dataiku or advanced MLflow | Has set up complete monitoring, has configured automatic retraining | Has no monitoring on their models in production |
| Lead | Organizational MLOps architecture, model governance, platform choices | Has defined the MLOps framework for their organization, has chosen and deployed the platform | Cannot explain why their models die in production |