Guides pour evaluer rigoureusement les competences techniques data — SQL, dbt, Spark, Power BI, Tableau, Cloud, MLOps et plus.
Jointures, window functions, optimisation — ce que SQL signifie vraiment en entretien data.
Power Query, DAX et contexte de filtre, modelisation en etoile, deploiement en production.
Branches, conflits, rebase et workflow CI/CD dans un contexte data.
Structure projet, tests generiques et custom, macros Jinja, CI/CD slim.
Lazy execution, partitionnement, data skew et optimisation de cluster.
DAGs, operateurs, TaskFlow API, SLAs et monitoring en production.
Pandas/DuckDB, POO, tests unitaires, FastAPI — Python orientee production.
LOD expressions, table calculations, Performance Recorder et gouvernance.
LookML, PDT, RBAC et semantic layer d'entreprise.
Partitionnement, clustering, QUALIFY et gestion des couts BQ.
Time Travel, Streams/Tasks, Dynamic Tables et optimisation couts.
Dockerfile optimise, volumes, Docker Compose et CI/CD containerisee.
Navigation, scripts bash, processus et monitoring en production.
S3, IAM, Glue/Athena/Redshift, MWAA et gestion des couts AWS.
Canvas apps, Power Fx, delegation et gouvernance ALM.
Flows automatises, custom connectors, gestion d'erreurs et DLP.
Projets/flows, Visual ML, MLOps et gouvernance multi-instances.
CALCULATE, VAR/RETURN, time intelligence, SUMX, DAX Studio — les formules qu'on teste en entretien.
LOD FIXED/INCLUDE/EXCLUDE, calculs de table, Tableau Prep, dashboards dynamiques, Performance Recorder.
TaskFlow API, XCom, trigger rules, executors, Secrets Backend et architecture en production.
Rebase interactif, pre-commit hooks, cherry-pick, gestion des urgences et standards d'equipe.
Connectivite, modelisation, DAX vs LookML, performances, couts et gouvernance — quel outil pour quel contexte ?
Canvas apps, Power Fx, Dataverse vs SharePoint, Power Automate et gouvernance ALM DEV/TEST/PROD.
Pods, nodes, cluster, GKE Autopilot vs Standard, Airflow et Spark sur K8s, Workload Identity.
Les 4 principes, data products, self-serve infrastructure et gouvernance federee — vs Data Lakehouse.
Discovery utilisateur, Double Diamond, wireframes, choix des visuels et hierarchie d'information.
MLOps 0/1/2, MLflow Registry, CI/CD pour le ML, deploiement de modeles et monitoring de drift.
Catalogue de donnees, data lineage, data quality (dbt + Great Expectations), MDM et RGPD.
HTTP, pages statiques vs dynamiques, Scrapy pour la production, anti-bots et legalite.
Structure de rapport, optimisation des performances, signets, Performance Analyzer et deploiement.
Vectorisation, optimisation memoire, DuckDB pour les gros volumes et API lazy Polars.
Quel outil self-service choisir selon le contexte ? Gouvernance, limites et positionnement vs Power BI.
Retrieval-Augmented Generation, chunks, bases vectorielles, hybrid search, re-ranking.
Architecture MCP, transports STDIO vs HTTP, creation de serveurs Python, ecosysteme.
Biais algorithmiques, SHAP, equite algorithmique, IA generative, impact carbone.
TSC Python, deploiement STAGE-PROD, gestion utilisateurs, refresh API, audit securite.
Playwright, Scrapy, anti-detection, crawling vs scraping, legislation.
GA4, Google Tag Manager, SGTM, CRO, data quality web.
Deployment Pipelines, XMLA, API REST, PBIP format, Git integration.
Cycle de vie ML, framework CI/CD, Dataiku, fiabilite scalabilite tracabilite.
Macros Jinja avancees, dbt-utils, snapshots SCD2, hooks, exposures.
Window functions avancees, CTEs recursives, EXPLAIN, Materialized Views, JSON.
Structured Streaming, Delta Lake, AQE, Spark on K8s, Unity Catalog.
Streams CDC, Tasks, Dynamic Tables, Zero-Copy Cloning, Snowpark Python.
Partitionnement, clustering, INFORMATION_SCHEMA, BigQuery ML, Omni.
Topics, partitions, consumer groups, Kafka Streams, outbox pattern.
FastAPI ML, Pydantic, async, securite, tests, Docker Gunicorn.
ACID, time travel, schema evolution, partition evolution, choix selon l ecosysteme.
Unity Catalog, Delta Live Tables, MLflow integre, Photon, Serverless.
Modules GCP/AWS, remote backend, CI/CD Terraform, import de ressources.
Slim CI dbt, tests automatises, Docker push, secrets OIDC, matrix strategy.
API Lazy, scan_parquet, expressions Polars, semi/anti join, zero-copy Arrow.
Suites d expectations, checkpoints, alertes Slack, integration Airflow et dbt.
Encodage, valeurs manquantes, features temporelles, scaling, selection, leakage.
LCEL, agents avec tools, memory, LangGraph workflows, observabilite LangSmith.
Dockerfile optimise, stack data locale, networking, multi-stage, registry CI.
Grilles Data Engineer, Analyst, Scientist, Analytics Engineer, conseils negociation.
Red flags techniques, posture, soft skills, questions revelrices, GO/NO-GO.
SQL, dbt, modelisation dimensionnelle, Python, questions de contexte business.
Statistiques, ML supervise, evaluation rigoureuse, cas metier, deploiement.
Modern Data Stack, Lakehouse, Data Mesh — comparaison et choix selon le contexte.
Schema versioning, SLA, implementation dbt-contracts, breaking changes.
ELT vs ETL, Fivetran HVR, Airbyte CDK, connecteurs custom, CDC log-based.
Flows, tasks, deployments, work pools, Airflow vs Prefect vs Dagster.
Modele semantique LookML, explores, PDTs, aggregate awareness, Git workflow.
Tokenisation BPE, sentence transformers, BERT, fine-tuning LoRA, ONNX.
Bagging vs boosting, XGBoost tuning, feature importance SHAP, LightGBM.
ADF architecture, linked services, triggers, Mapping Data Flow, Synapse vs Fabric.
Sigmoid, odds ratios, regularisation L1/L2, hypotheses, multiclasse.
4 phases, preparation, technique sans pieger, comportemental, grille GO/NO-GO.
Plan 30-60-90 jours, cartographier l architecture, documentation, erreurs classiques.
README, structure, code, tests, commits, types de projets revelateurs.
Organisation modeles, incremental, slim CI, couts, governance, documentation.
YData Profiling, MCAR/MAR/MNAR, outliers, correlations, rapport actionnable.
OpenLineage standard, DataHub, dbt lineage, column-level lineage, impact analysis.
Helm charts, StatefulSets, secrets management, GitOps ArgoCD, KubernetesExecutor.
Fixtures, mocks, tests SQL avec DuckDB, singular tests dbt, tests integration.
Dedicated vs Serverless SQL Pool, Spark Pool, ADLS, Synapse vs Databricks.
Architecture GCP, Dataflow Apache Beam, Pub/Sub, Cloud Composer, Vertex AI.
Offre attractive, sourcing, analyse CV, processus, decision, negociation.
Alertes fraicheur, detection anomalies, changements schema, Elementary vs Monte Carlo.
Append, merge, delete+insert, late-arriving data, full refresh, Snowflake.
Event time, watermarks, triggers, output modes, Kafka + Spark + Delta.
Bronze/Silver/Gold, choix table format, small files, vacuum, securite par zone.
Shuffle, broadcast joins, caching, Pandas UDFs, Spark UI diagnostic.
Sessionisation, funnel analysis, scoring RFM, intervalles, attribution multi-touch.
Optimiser Snowflake, BigQuery, Databricks, S3, monitoring, culture FinOps.
REST principes, auth, pagination cursor, versioning, retry backoff.
Standards de code, code review, definition metriques, documentation, dette technique.
asyncio, coroutines, httpx async, Semaphore, TaskGroup, pipeline ingestion.
Chiffrement at-rest/in-transit, Secret Manager, RBAC, anonymisation, audit logs.
Puissance, erreurs type I/II, test t/z/chi2, peeking problem, interpretation.
Tests generiques avances, singular tests, macros custom, coverage, CI optimise.
Query Profile, clustering efficacite, Materialized Views, SOS, QAS, multi-cluster.
Pipeline sklearn, serving BentoML/FastAPI, monitoring drift, canary, retraining.
SCD 1/2/3, snapshots dbt, strategies timestamp vs check, suppressions.
Modelisation en graphe, Cypher, traversees, recommandations, detection fraude.
Decomposition, stationnarite, ARIMA, Prophet, detection anomalies, features ML.
Event time, watermarks, stateful processing, checkpointing, Flink SQL.
Pyramide de tests, tests integration, DAGs Airflow, contract testing, chaos.
HDFS, MapReduce vs Spark, HiveQL, partitionnement, migration vers le cloud.
MLflow Projects, tracking avance, Model Registry workflow, serving, CI/CD ML.
Choix catalog, DataHub architecture, ingestion automatique, lineage, adoption.
run_query(), generate_schema_name, dispatch, packages internes, graph metadata.
Ce qu on evalue, ce qu il faut reviser, s entrainer, posture, questions a poser.
Decorateurs parametres, retry backoff, cache TTL, logging automatique, composition.
Factless facts, bridge tables, junk dimensions, role-playing, mini-dimensions.
PIVOT/UNPIVOT, ASOF JOIN, Parquet direct, integration pandas, S3.
Catalog REST, partition evolution, time travel avance, compaction, multi-engine.
Pourquoi Rust en data, PyO3 extensions Python, Polars internals, alternatives.
Managed vs self-hosted, serverless data, vendor lock-in, multi-cloud, couts.
Jobs complexes, Git integration, Asset Bundles, CI/CD GitHub Actions, couts.
Architecture coordinator/workers, connecteurs, federation, EXPLAIN, vs Spark.
Profiling automatique, expectations statistiques, custom expectations, CI bloquant.
Architecture AWS, S3 avance, Glue ETL, Athena serverless, Lake Formation RBAC.
Property-based testing Hypothesis, mutmut, parametrize, benchmarks, coverage.
Product thinking data, data product, roadmap, OKR, mesurer l impact.
Generation donnees test, distributions statistiques, privacy, evaluation qualite.