Recrutement data

Guides d'entretien techniques data

Guides pour evaluer rigoureusement les competences techniques data — SQL, dbt, Spark, Power BI, Tableau, Cloud, MLOps et plus.

Data Analyst Analytics Engineer

Test technique SQL

Jointures, window functions, optimisation — ce que SQL signifie vraiment en entretien data.

9 min de lecture
Data Analyst

Test technique Power BI

Power Query, DAX et contexte de filtre, modelisation en etoile, deploiement en production.

9 min de lecture
Data Engineer Analytics Engineer

Test technique Git

Branches, conflits, rebase et workflow CI/CD dans un contexte data.

7 min de lecture
Analytics Engineer

Test technique dbt

Structure projet, tests generiques et custom, macros Jinja, CI/CD slim.

7 min de lecture
Data Engineer

Test technique Spark

Lazy execution, partitionnement, data skew et optimisation de cluster.

8 min de lecture
Data Engineer

Test technique Airflow

DAGs, operateurs, TaskFlow API, SLAs et monitoring en production.

7 min de lecture
Data Engineer Data Scientist

Test technique Python data

Pandas/DuckDB, POO, tests unitaires, FastAPI — Python orientee production.

8 min de lecture
Data Analyst

Test technique Tableau

LOD expressions, table calculations, Performance Recorder et gouvernance.

7 min de lecture
Data Analyst Analytics Engineer

Test technique Looker / LookML

LookML, PDT, RBAC et semantic layer d'entreprise.

7 min de lecture
Data Engineer Analytics Engineer

Test technique BigQuery

Partitionnement, clustering, QUALIFY et gestion des couts BQ.

7 min de lecture
Data Engineer Analytics Engineer

Test technique Snowflake

Time Travel, Streams/Tasks, Dynamic Tables et optimisation couts.

7 min de lecture
Data Engineer

Test technique Docker

Dockerfile optimise, volumes, Docker Compose et CI/CD containerisee.

7 min de lecture
Data Engineer

Test technique Linux / Bash

Navigation, scripts bash, processus et monitoring en production.

7 min de lecture
Data Engineer

Test technique AWS pour la Data

S3, IAM, Glue/Athena/Redshift, MWAA et gestion des couts AWS.

8 min de lecture
Data Analyst

Test technique Power Apps

Canvas apps, Power Fx, delegation et gouvernance ALM.

6 min de lecture
Data Analyst

Test technique Power Automate

Flows automatises, custom connectors, gestion d'erreurs et DLP.

6 min de lecture
Data Scientist Analytics Engineer

Test technique Dataiku

Projets/flows, Visual ML, MLOps et gouvernance multi-instances.

7 min de lecture
Data Analyst

DAX avancé Power BI

CALCULATE, VAR/RETURN, time intelligence, SUMX, DAX Studio — les formules qu'on teste en entretien.

10 min de lecture
Data Analyst

Tableau avancé : LOD, calculs de table

LOD FIXED/INCLUDE/EXCLUDE, calculs de table, Tableau Prep, dashboards dynamiques, Performance Recorder.

9 min de lecture
Data Engineer

Airflow avancé : DAGs, XCom, TaskFlow

TaskFlow API, XCom, trigger rules, executors, Secrets Backend et architecture en production.

8 min de lecture
Data Engineer Analytics Engineer

Git avancé : rebase, hooks, gitflow

Rebase interactif, pre-commit hooks, cherry-pick, gestion des urgences et standards d'equipe.

8 min de lecture
Data Analyst Analytics Engineer

Power BI vs Tableau vs Looker

Connectivite, modelisation, DAX vs LookML, performances, couts et gouvernance — quel outil pour quel contexte ?

9 min de lecture
Data Analyst Analytics Engineer

Power Apps pour la data

Canvas apps, Power Fx, Dataverse vs SharePoint, Power Automate et gouvernance ALM DEV/TEST/PROD.

8 min de lecture
Data Engineer

Kubernetes pour la data : GKE, pods, nodes

Pods, nodes, cluster, GKE Autopilot vs Standard, Airflow et Spark sur K8s, Workload Identity.

8 min de lecture
Data Engineer Analytics Engineer

Data Mesh : domaines, data products, gouvernance

Les 4 principes, data products, self-serve infrastructure et gouvernance federee — vs Data Lakehouse.

8 min de lecture
Data Analyst

UX design pour dashboards

Discovery utilisateur, Double Diamond, wireframes, choix des visuels et hierarchie d'information.

8 min de lecture
Data Scientist Data Engineer

MLOps : CI/CD ML, MLflow, deploiement

MLOps 0/1/2, MLflow Registry, CI/CD pour le ML, deploiement de modeles et monitoring de drift.

9 min de lecture
Analytics Engineer Data Engineer

Data Gouvernance : catalogue, lineage, qualite

Catalogue de donnees, data lineage, data quality (dbt + Great Expectations), MDM et RGPD.

8 min de lecture
Data Engineer

Web scraping Python : BeautifulSoup, Selenium, Scrapy

HTTP, pages statiques vs dynamiques, Scrapy pour la production, anti-bots et legalite.

8 min de lecture
Data Analyst

Power BI bonnes pratiques rapport et modele

Structure de rapport, optimisation des performances, signets, Performance Analyzer et deploiement.

8 min de lecture
Data Engineer Data Scientist

Python data avance : pandas, DuckDB, Polars

Vectorisation, optimisation memoire, DuckDB pour les gros volumes et API lazy Polars.

9 min de lecture
Data Analyst Analytics Engineer

BI self-service : Metabase, Streamlit, Redash

Quel outil self-service choisir selon le contexte ? Gouvernance, limites et positionnement vs Power BI.

7 min de lecture
Data Engineer Data Scientist

RAG : architecture, embeddings, bases vectorielles

Retrieval-Augmented Generation, chunks, bases vectorielles, hybrid search, re-ranking.

7 min de lecture
Data Engineer Data Scientist

MCP : Model Context Protocol pour les agents IA

Architecture MCP, transports STDIO vs HTTP, creation de serveurs Python, ecosysteme.

6 min de lecture
Data Scientist

IA responsable : biais, equite, SHAP, AI Act

Biais algorithmiques, SHAP, equite algorithmique, IA generative, impact carbone.

7 min de lecture
Data Analyst Analytics Engineer

Tableau Server Client : automatisation Python

TSC Python, deploiement STAGE-PROD, gestion utilisateurs, refresh API, audit securite.

6 min de lecture
Data Engineer

Scraping et automatisation Python avancee

Playwright, Scrapy, anti-detection, crawling vs scraping, legislation.

6 min de lecture
Data Analyst Analytics Engineer

Web Analytics : GA4, GTM, server-side tracking

GA4, Google Tag Manager, SGTM, CRO, data quality web.

6 min de lecture
Data Analyst Analytics Engineer

CI/CD Power BI : deployment pipelines

Deployment Pipelines, XMLA, API REST, PBIP format, Git integration.

6 min de lecture
Data Scientist Data Engineer

MLOps en pratique : CI/CD ML et monitoring modeles

Cycle de vie ML, framework CI/CD, Dataiku, fiabilite scalabilite tracabilite.

7 min de lecture
Analytics Engineer

dbt avance : macros, packages, snapshots

Macros Jinja avancees, dbt-utils, snapshots SCD2, hooks, exposures.

7 min de lecture
Data Analyst Analytics Engineer

SQL avance : window functions, CTEs recursives

Window functions avancees, CTEs recursives, EXPLAIN, Materialized Views, JSON.

7 min de lecture
Data Engineer

Spark avance : Delta Lake, Streaming, AQE

Structured Streaming, Delta Lake, AQE, Spark on K8s, Unity Catalog.

7 min de lecture
Data Engineer Analytics Engineer

Snowflake avance : Streams, Tasks, Dynamic Tables

Streams CDC, Tasks, Dynamic Tables, Zero-Copy Cloning, Snowpark Python.

7 min de lecture
Data Engineer Analytics Engineer

BigQuery avance : partitionnement, couts, ML

Partitionnement, clustering, INFORMATION_SCHEMA, BigQuery ML, Omni.

7 min de lecture
Data Engineer

Apache Kafka : topics, consumers, Schema Registry

Topics, partitions, consumer groups, Kafka Streams, outbox pattern.

7 min de lecture
Data Scientist Data Engineer

FastAPI pour la data : APIs de scoring ML

FastAPI ML, Pydantic, async, securite, tests, Docker Gunicorn.

6 min de lecture
Data Engineer

Delta Lake vs Iceberg vs Hudi : choix du table format

ACID, time travel, schema evolution, partition evolution, choix selon l ecosysteme.

7 min de lecture
Data Engineer Data Scientist

Databricks : Unity Catalog, DLT, MLflow, Jobs

Unity Catalog, Delta Live Tables, MLflow integre, Photon, Serverless.

6 min de lecture
Data Engineer

Terraform pour la data : IaC, modules, state

Modules GCP/AWS, remote backend, CI/CD Terraform, import de ressources.

6 min de lecture
Data Engineer Analytics Engineer

GitHub Actions : CI/CD pour projets data

Slim CI dbt, tests automatises, Docker push, secrets OIDC, matrix strategy.

6 min de lecture
Data Engineer Data Scientist

Polars avance : API lazy, expressions, joins

API Lazy, scan_parquet, expressions Polars, semi/anti join, zero-copy Arrow.

6 min de lecture
Data Engineer Analytics Engineer

Great Expectations : tester la qualite des donnees

Suites d expectations, checkpoints, alertes Slack, integration Airflow et dbt.

6 min de lecture
Data Scientist

Feature engineering ML : creer de bonnes features

Encodage, valeurs manquantes, features temporelles, scaling, selection, leakage.

7 min de lecture
Data Engineer Data Scientist

LangChain : agents, chains, LangGraph, LangSmith

LCEL, agents avec tools, memory, LangGraph workflows, observabilite LangSmith.

6 min de lecture
Data Engineer

Docker et Docker Compose pour la data

Dockerfile optimise, stack data locale, networking, multi-stage, registry CI.

6 min de lecture
Data Analyst Data Engineer Data Scientist Analytics Engineer

Salaires data en France 2025 : grilles et negociation

Grilles Data Engineer, Analyst, Scientist, Analytics Engineer, conseils negociation.

5 min de lecture
Data Analyst Data Engineer Data Scientist Analytics Engineer

Red flags en entretien data : signaux d alerte

Red flags techniques, posture, soft skills, questions revelrices, GO/NO-GO.

5 min de lecture
Analytics Engineer

Test technique complet Analytics Engineer

SQL, dbt, modelisation dimensionnelle, Python, questions de contexte business.

8 min de lecture
Data Scientist

Test technique complet Data Scientist

Statistiques, ML supervise, evaluation rigoureuse, cas metier, deploiement.

8 min de lecture
Data Engineer Analytics Engineer

Stack data moderne 2025 : quelle architecture choisir

Modern Data Stack, Lakehouse, Data Mesh — comparaison et choix selon le contexte.

7 min de lecture
Analytics Engineer Data Engineer

Data Contracts : implementer avec dbt et YAML

Schema versioning, SLA, implementation dbt-contracts, breaking changes.

6 min de lecture
Data Engineer Analytics Engineer

Airbyte vs Fivetran : connecteurs ELT et ingestion

ELT vs ETL, Fivetran HVR, Airbyte CDK, connecteurs custom, CDC log-based.

6 min de lecture
Data Engineer

Prefect : orchestration moderne vs Airflow

Flows, tasks, deployments, work pools, Airflow vs Prefect vs Dagster.

6 min de lecture
Analytics Engineer

Looker avance : LookML, PDTs, deploiement Git

Modele semantique LookML, explores, PDTs, aggregate awareness, Git workflow.

7 min de lecture
Data Scientist

NLP et embeddings : tokenisation, BERT, fine-tuning

Tokenisation BPE, sentence transformers, BERT, fine-tuning LoRA, ONNX.

7 min de lecture
Data Scientist

XGBoost et Random Forest : hyperparametres, SHAP

Bagging vs boosting, XGBoost tuning, feature importance SHAP, LightGBM.

7 min de lecture
Data Engineer

Azure Data Factory : pipelines, triggers, DataFlow

ADF architecture, linked services, triggers, Mapping Data Flow, Synapse vs Fabric.

6 min de lecture
Data Scientist

Regression logistique : fondations stats pour DS

Sigmoid, odds ratios, regularisation L1/L2, hypotheses, multiclasse.

6 min de lecture
Data Analyst Data Engineer Data Scientist Analytics Engineer

Structurer un entretien technique data : guide recruteur

4 phases, preparation, technique sans pieger, comportemental, grille GO/NO-GO.

6 min de lecture
Data Engineer Analytics Engineer

Onboarding Data Engineer : les 90 premiers jours

Plan 30-60-90 jours, cartographier l architecture, documentation, erreurs classiques.

5 min de lecture
Data Analyst Data Engineer Data Scientist Analytics Engineer

Evaluer un portfolio GitHub data : ce qu on regarde

README, structure, code, tests, commits, types de projets revelateurs.

5 min de lecture
Analytics Engineer

Stack dbt + Snowflake : architecture complete production

Organisation modeles, incremental, slim CI, couts, governance, documentation.

7 min de lecture
Data Analyst Data Scientist

pandas profiling et diagnostic qualite des donnees

YData Profiling, MCAR/MAR/MNAR, outliers, correlations, rapport actionnable.

6 min de lecture
Analytics Engineer Data Engineer

Data lineage : OpenLineage, DataHub, column-level

OpenLineage standard, DataHub, dbt lineage, column-level lineage, impact analysis.

6 min de lecture
Data Engineer

Docker et Kubernetes avance : Helm, GitOps, ArgoCD

Helm charts, StatefulSets, secrets management, GitOps ArgoCD, KubernetesExecutor.

7 min de lecture
Data Engineer Analytics Engineer

Tester ses pipelines data avec pytest

Fixtures, mocks, tests SQL avec DuckDB, singular tests dbt, tests integration.

6 min de lecture
Data Engineer

Azure Synapse : SQL pools, Spark, Data Lake

Dedicated vs Serverless SQL Pool, Spark Pool, ADLS, Synapse vs Databricks.

6 min de lecture
Data Engineer

GCP stack data : BigQuery, Dataflow, Pub/Sub, Vertex AI

Architecture GCP, Dataflow Apache Beam, Pub/Sub, Cloud Composer, Vertex AI.

7 min de lecture
Data Analyst Data Engineer Data Scientist Analytics Engineer

Guide complet recrutement data 2025

Offre attractive, sourcing, analyse CV, processus, decision, negociation.

8 min de lecture
Data Engineer Analytics Engineer

Monitoring qualite donnees en production

Alertes fraicheur, detection anomalies, changements schema, Elementary vs Monte Carlo.

6 min de lecture
Analytics Engineer

dbt modeles incrementaux : strategies et optimisation

Append, merge, delete+insert, late-arriving data, full refresh, Snowflake.

7 min de lecture
Data Engineer

Spark Structured Streaming : watermarks, triggers, sinks

Event time, watermarks, triggers, output modes, Kafka + Spark + Delta.

7 min de lecture
Data Engineer Analytics Engineer

Architecture Lakehouse : medallion, compaction, catalog

Bronze/Silver/Gold, choix table format, small files, vacuum, securite par zone.

7 min de lecture
Data Engineer

PySpark optimisation : partitions, broadcast, Spark UI

Shuffle, broadcast joins, caching, Pandas UDFs, Spark UI diagnostic.

7 min de lecture
Data Analyst Analytics Engineer

SQL analytique expert : sessions, funnel, RFM, attribution

Sessionisation, funnel analysis, scoring RFM, intervalles, attribution multi-touch.

7 min de lecture
Data Engineer Analytics Engineer

FinOps data : reduire les couts cloud

Optimiser Snowflake, BigQuery, Databricks, S3, monitoring, culture FinOps.

6 min de lecture
Data Engineer Data Scientist

API REST pour la data : concevoir et consommer

REST principes, auth, pagination cursor, versioning, retry backoff.

6 min de lecture
Data Engineer Analytics Engineer

Gouvernance equipe data : standards, review, culture

Standards de code, code review, definition metriques, documentation, dette technique.

6 min de lecture
Data Engineer

Python async pour la data : asyncio, httpx, ingestion

asyncio, coroutines, httpx async, Semaphore, TaskGroup, pipeline ingestion.

6 min de lecture
Data Engineer

Securite des donnees : chiffrement, secrets, RBAC

Chiffrement at-rest/in-transit, Secret Manager, RBAC, anonymisation, audit logs.

6 min de lecture
Data Scientist Data Analyst

A/B testing rigoureux : statistiques pour les decisions

Puissance, erreurs type I/II, test t/z/chi2, peeking problem, interpretation.

7 min de lecture
Analytics Engineer

dbt testing avance : singular tests, macros de test

Tests generiques avances, singular tests, macros custom, coverage, CI optimise.

6 min de lecture
Data Engineer Analytics Engineer

Snowflake performance : Query Profile, clustering avance

Query Profile, clustering efficacite, Materialized Views, SOS, QAS, multi-cluster.

6 min de lecture
Data Scientist Data Engineer

ML en production : packaging, serving, monitoring

Pipeline sklearn, serving BentoML/FastAPI, monitoring drift, canary, retraining.

7 min de lecture
Analytics Engineer Data Engineer

dbt snapshots et SCD : historiser les changements

SCD 1/2/3, snapshots dbt, strategies timestamp vs check, suppressions.

6 min de lecture
Data Engineer Data Scientist

Bases de donnees graphe : Neo4j, Cypher, recommandations

Modelisation en graphe, Cypher, traversees, recommandations, detection fraude.

6 min de lecture
Data Scientist Data Analyst

Series temporelles : ARIMA, Prophet, anomaly detection

Decomposition, stationnarite, ARIMA, Prophet, detection anomalies, features ML.

7 min de lecture
Data Engineer

Apache Flink : stream processing avance

Event time, watermarks, stateful processing, checkpointing, Flink SQL.

6 min de lecture
Data Engineer

Tester les pipelines data de bout en bout

Pyramide de tests, tests integration, DAGs Airflow, contract testing, chaos.

6 min de lecture
Data Engineer

Hadoop et Hive : maitriser l heritage data

HDFS, MapReduce vs Spark, HiveQL, partitionnement, migration vers le cloud.

6 min de lecture
Data Scientist Data Engineer

MLflow avance : Projects, Registry, serving

MLflow Projects, tracking avance, Model Registry workflow, serving, CI/CD ML.

6 min de lecture
Analytics Engineer Data Engineer

Data catalog : DataHub, OpenMetadata, adoption

Choix catalog, DataHub architecture, ingestion automatique, lineage, adoption.

6 min de lecture
Analytics Engineer

dbt macros et Jinja avance : SQL dynamique

run_query(), generate_schema_name, dispatch, packages internes, graph metadata.

7 min de lecture
Data Analyst Data Engineer Data Scientist Analytics Engineer

Preparer son entretien technique data : guide candidat

Ce qu on evalue, ce qu il faut reviser, s entrainer, posture, questions a poser.

6 min de lecture
Data Engineer

Python decorateurs pour la data : retry, cache, logging

Decorateurs parametres, retry backoff, cache TTL, logging automatique, composition.

6 min de lecture
Analytics Engineer Data Engineer

Conception data warehouse : modelisation avancee

Factless facts, bridge tables, junk dimensions, role-playing, mini-dimensions.

7 min de lecture
Data Engineer Data Scientist

DuckDB : SQL analytique in-process pour la data

PIVOT/UNPIVOT, ASOF JOIN, Parquet direct, integration pandas, S3.

6 min de lecture
Data Engineer

Apache Iceberg avance : catalog REST, CoW vs MoR

Catalog REST, partition evolution, time travel avance, compaction, multi-engine.

7 min de lecture
Data Engineer

Rust pour la data : PyO3, Polars, performance

Pourquoi Rust en data, PyO3 extensions Python, Polars internals, alternatives.

6 min de lecture
Data Engineer Analytics Engineer

Architecture cloud-native data : managed vs self-hosted

Managed vs self-hosted, serverless data, vendor lock-in, multi-cloud, couts.

6 min de lecture
Data Engineer

Databricks Workflows : Asset Bundles, CI/CD

Jobs complexes, Git integration, Asset Bundles, CI/CD GitHub Actions, couts.

6 min de lecture
Data Engineer

Trino : SQL federe sur sources multiples

Architecture coordinator/workers, connecteurs, federation, EXPLAIN, vs Spark.

6 min de lecture
Data Engineer Analytics Engineer

Great Expectations avance : profiling, CI/CD bloquant

Profiling automatique, expectations statistiques, custom expectations, CI bloquant.

6 min de lecture
Data Engineer

AWS stack data : S3, Glue, Athena, Redshift, Lake Formation

Architecture AWS, S3 avance, Glue ETL, Athena serverless, Lake Formation RBAC.

7 min de lecture
Data Engineer

Python testing avance : Hypothesis, mutation testing

Property-based testing Hypothesis, mutmut, parametrize, benchmarks, coverage.

6 min de lecture
Data Engineer Analytics Engineer Data Scientist Data Analyst

Data product management : roadmap, OKR, impact

Product thinking data, data product, roadmap, OKR, mesurer l impact.

6 min de lecture
Data Engineer Data Scientist

Donnees synthetiques : Faker, SDV, Gretel

Generation donnees test, distributions statistiques, privacy, evaluation qualite.

5 min de lecture