Guide recrutement data
Test technique Dataiku : ce qu'on evalue en entretien Data Scientist
Dataiku est la plateforme data collaborative la plus adoptee en grande entreprise francaise. Voici comment evaluer la maitrise reelle.
Data Builder·Juin 2025·7 min de lecture·Data Scientist · Analytics Engineer
Dataiku est la plateforme data collaborative la plus adoptee en grande entreprise francaise. Evaluer un profil Dataiku, c'est evaluer sa capacite a travailler dans un environnement collaboratif structure.
1Projets et flows
Question discriminante
Qu'est-ce qu'un projet Dataiku ? Difference entre dataset, recette et flow ?
Un projet contient des datasets (donnees), des recettes (transformations) et un flow (le DAG visuel qui connecte tout).
- Datasets — fichiers, tables SQL, S3, API — tout est un dataset
- Recettes : visual, SQL, Python, R, Spark
- Flow — DAG visuel partage avec l'equipe
- Scenarios — orchestration automatique du flow
2Preparation de donnees
Question discriminante
Quand utilisez-vous la recette Prepare visuelle plutot qu'une recette Python ou SQL ?
- Prepare — 100+ processeurs visuels, accessible aux non-developpeurs
- SQL — transformations poussees vers la base
- Python/R — logique complexe, bibliotheques externes
- Spark — traitement a grande echelle
3Machine Learning
Question discriminante
Decrivez le workflow de creation d'un modele ML dans Dataiku jusqu'au deploiement.
- Visual ML — creation de modeles sans code
- Lab — experimentation et comparaison de modeles
- Saved Model — modele versionne deploye dans le flow
- SHAP values — explicabilite des predictions
4MLOps et deploiement
Question discriminante
Comment gerez-vous le retraining automatique d'un modele en production ?
- Model drift detection — surveiller la degradation des performances
- Scenarios et triggers — retraining automatique sur condition
- API deployment — via Design Node ou API Node dedie
5Gouvernance et collaboration
Question discriminante
Comment organisez-vous la collaboration entre data scientists et data analysts sur un meme projet ?
- Connexions partagees — centraliser les acces aux sources
- Bundles — packager et deployer des projets entre instances
- Fleet Manager — gouvernance multi-instances
6Grille par niveau
| Niveau | Maitrise attendue | Signal GO | NO-GO |
|---|
| Junior | Navigation projets, recette Prepare, Visual ML basique | Comprend la difference dataset/recette/flow | Ne sait pas quand choisir Python vs recette Prepare |
| Confirme | Recettes Python/SQL, Lab ML, Saved Models, Scenarios | A deploye un modele et automatise son retraining | N'a jamais configure de drift detection |
| Senior | MLOps, API deployment, Fleet Manager, explicabilite | A gere le cycle de vie complet d'un modele en prod | Ne connait pas les bundles ni le Fleet Manager |
| Lead | Architecture multi-projets, gouvernance, standards equipe | A defini les standards Dataiku pour l'organisation | Ne peut pas expliquer Design Node vs API Node |
Data hiring guide
Dataiku technical interview: what we really assess in a Data Scientist interview
Dataiku is the most widely adopted collaborative data platform in large French enterprises. Here is how to assess real proficiency.
Data Builder·June 2025·7 min read·Data Scientist · Analytics Engineer
Dataiku is the most widely adopted collaborative data platform in large French enterprises. Assessing a Dataiku profile means assessing their ability to work in a structured collaborative environment.
1Projects and flows
Key question
What is a Dataiku project? What is the difference between a dataset, a recipe, and a flow?
A project contains datasets (data), recipes (transformations), and a flow (the visual DAG that connects everything).
- Datasets — files, SQL tables, S3, APIs — everything is a dataset
- Recipes: visual, SQL, Python, R, Spark
- Flow — visual DAG shared with the team
- Scenarios — automatic flow orchestration
2Data preparation
Key question
When do you use the visual Prepare recipe rather than a Python or SQL recipe?
- Prepare — 100+ visual processors, accessible to non-developers
- SQL — transformations pushed down to the database
- Python/R — complex logic, external libraries
- Spark — large-scale processing
3Machine Learning
Key question
Describe the workflow for creating an ML model in Dataiku through to deployment.
- Visual ML — no-code model creation
- Lab — model experimentation and comparison
- Saved Model — versioned model deployed in the flow
- SHAP values — prediction explainability
4MLOps and deployment
Key question
How do you manage automatic retraining of a model in production?
- Model drift detection — monitoring performance degradation
- Scenarios and triggers — automatic retraining on condition
- API deployment — via Design Node or dedicated API Node
5Governance and collaboration
Key question
How do you organize collaboration between data scientists and data analysts on the same project?
- Shared connections — centralizing access to sources
- Bundles — packaging and deploying projects between instances
- Fleet Manager — multi-instance governance