Guide recrutement data
Test technique BigQuery : ce qu'on evalue en entretien
BigQuery est puissant — et couteux si mal utilise. Un SELECT * sur 10 TB peut couter 50 euros en un clic.
Data Builder·Juin 2025·7 min de lecture·Data Engineer · Analytics Engineer
BigQuery est puissant — et couteux si mal utilise. Un SELECT * sur une table de 10 TB peut couter 50 euros en un clic.
1SQL BigQuery : specificites
Question discriminante
Differences principales entre le SQL BigQuery et un SQL standard ?
-- QUALIFY : alternative elegante aux sous-requetes
SELECT user_id, event_date
FROM events
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_date DESC) = 1
- QUALIFY — filtrer sur window functions sans sous-requete
- STRUCT et ARRAY — types imbriques tres courants dans BQ
- UNNEST — aplatir les arrays en lignes
2Partitionnement et clustering
Question discriminante
Difference entre partitionnement et clustering ? Quand utiliser l'un ou l'autre ?
Le partitionnement divise la table en segments (souvent par date). Le clustering trie les donnees dans chaque partition. Les deux sont complementaires.
- Partition pruning — le filtre doit porter sur la colonne de partition
- Partition expiration — supprimer automatiquement les donnees anciennes
3Gestion des couts
Question discriminante
Comment optimisez-vous les couts sur BigQuery ? Trois pratiques concretes.
- Eviter SELECT * — le cout est proportionnel aux bytes scannes
- Materialized views — pre-calculer les requetes frequentes
- Cost controls — quotas par projet ou utilisateur
Signal d'alerte : SELECT * sur une grande table non partitionnee est eliminatoire pour un profil Senior.
4Performance
Question discriminante Senior
Une requete scanne 10 TB et prend 5 minutes. Comment l'analysez-vous ?
- Query plan et execution details dans la console BQ
- Bytes processed — indicateur principal
- Broadcast des petites tables
5Architecture et ecosystem
- BigQuery ML — modeles ML directement en SQL
- BigQuery Omni — requeter des donnees sur AWS ou Azure
- Tables externes sur Google Cloud Storage
6Grille par niveau
| Niveau | Maitrise attendue | Signal GO | NO-GO |
|---|
| Junior | SQL BQ de base, lecture/ecriture de tables | Connait QUALIFY, evite SELECT * | Ne sait pas ce qu'est le partitionnement |
| Confirme | Partitionnement, clustering, STRUCT/ARRAY, couts | A partitionne une table, connait les bytes scannes | Ne sait pas pourquoi SELECT * est couteux |
| Senior | Materialized views, BI Engine, query plan | A optimise une requete de 10x moins de bytes | N'a jamais lu un query plan BQ |
| Lead | Architecture data lakehouse, gouvernance | A defini la strategie de partitionnement organisation | Ne connait pas BigQuery Omni |
Data hiring guide
BigQuery technical interview: what we really assess
BigQuery is powerful — and expensive if misused. A SELECT * on 10 TB can cost 50 euros in one click.
Data Builder·June 2025·7 min read·Data Engineer · Analytics Engineer
BigQuery is powerful — and expensive if misused. A SELECT * on a 10 TB table can cost 50 euros in one click.
1SQL BigQuery: specific features
Key question
What are the main differences between BigQuery SQL and standard SQL?
-- QUALIFY : alternative elegante aux sous-requetes
SELECT user_id, event_date
FROM events
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_date DESC) = 1
- QUALIFY — filter on window functions without a subquery
- STRUCT and ARRAY — nested types very common in BQ
- UNNEST — flatten arrays into rows
2Partitioning and clustering
Key question
What is the difference between partitioning and clustering? When should you use one or the other?
Partitioning splits the table into segments (often by date). Clustering sorts the data within each partition. Both are complementary.
- Partition pruning — the filter must apply to the partition column
- Partition expiration — automatically delete old data
3Cost management
Key question
How do you optimize costs on BigQuery? Give three concrete practices.
- Avoid SELECT * — cost is proportional to bytes scanned
- Materialized views — pre-compute frequent queries
- Cost controls — quotas per project or user
Warning signal: SELECT * on a large unpartitioned table is an eliminating factor for a Senior profile.
4Performance
Senior key question
A query scans 10 TB and takes 5 minutes. How do you analyze it?
- Query plan and execution details in the BQ console
- Bytes processed — primary indicator
- Broadcasting small tables
5Architecture and ecosystem
- BigQuery ML — ML models directly in SQL
- BigQuery Omni — query data on AWS or Azure
- External tables on Google Cloud Storage
6Level grid
| Level | Expected proficiency | GO signal | NO-GO |
|---|
| Junior | Basic BQ SQL, reading/writing tables | Knows QUALIFY, avoids SELECT * | Does not know what partitioning is |
| Mid-level | Partitioning, clustering, STRUCT/ARRAY, costs | Has partitioned a table, knows about bytes scanned | Does not know why SELECT * is expensive |
| Senior | Materialized views, BI Engine, query plan | Has optimized a query to scan 10x fewer bytes | Has never read a BQ query plan |
| Lead |