Accueil›Blog›Test technique SQL avance : window functions, CTEs recursives, optimisation

Guide recrutement data

Test technique SQL avance : window functions, CTEs recursives, optimisation

Tout le monde sait faire un SELECT avec JOIN. Ce qu on evalue en entretien Senior, c est la maitrise des window functions, l ecriture de CTEs recursives et la capacite a optimiser une requete lente.

Data Builder·Juin 2025·7 min de lecture·Data Analyst · Analytics Engineer

Sommaire

Window functions avancees
PIVOT et agrégations conditionnelles
CTEs recursives
Optimisation et EXPLAIN
Materialized Views
Manipulation JSON
Grille

En entretien SQL Senior, on teste rarement le SELECT simple. On teste la capacite a resoudre des problemes analytiques complexes : cohortes, hierarchies, sessions, palmarès glissants. Voici les patterns qu on evalúe vraiment.

1Window functions : les patterns avancés

Question discriminante

Quelle est la difference entre ROW_NUMBER(), RANK() et DENSE_RANK() ? Donnez un cas ou chacune est la bonne choix.

-- Analyse de cohortes : retention par semaine
WITH first_purchase AS (
    SELECT user_id,
           DATE_TRUNC('week', MIN(order_date)) AS cohort_week
    FROM orders
    GROUP BY user_id
),
weekly_activity AS (
    SELECT fp.cohort_week,
           DATE_TRUNC('week', o.order_date) AS activity_week,
           COUNT(DISTINCT o.user_id) AS users
    FROM first_purchase fp
    JOIN orders o ON fp.user_id = o.user_id
    GROUP BY 1, 2
)
SELECT
    cohort_week,
    activity_week,
    DATEDIFF('week', cohort_week, activity_week) AS week_number,
    users,
    FIRST_VALUE(users) OVER (PARTITION BY cohort_week ORDER BY activity_week) AS cohort_size,
    users * 100.0 / FIRST_VALUE(users) OVER (PARTITION BY cohort_week ORDER BY activity_week) AS retention_pct
FROM weekly_activity
ORDER BY cohort_week, week_number;

FRAME CLAUSE — ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW pour les cumuls, ROWS BETWEEN 6 PRECEDING AND CURRENT ROW pour les moyennes glissantes sur 7 jours
LAG/LEAD — acceder aux lignes precedentes/suivantes dans une partition. Indispensable pour les calculs de croissance
FIRST_VALUE/LAST_VALUE — premiere/derniere valeur d une partition. Utile pour les cohortes

2PIVOT et agregations conditionnelles

-- PIVOT dynamique avec agregation conditionnelle
-- Ventes par trimestre en colonnes
SELECT
    product_category,
    SUM(CASE WHEN QUARTER(order_date) = 1 THEN amount ELSE 0 END) AS Q1,
    SUM(CASE WHEN QUARTER(order_date) = 2 THEN amount ELSE 0 END) AS Q2,
    SUM(CASE WHEN QUARTER(order_date) = 3 THEN amount ELSE 0 END) AS Q3,
    SUM(CASE WHEN QUARTER(order_date) = 4 THEN amount ELSE 0 END) AS Q4,
    SUM(amount) AS total
FROM orders
GROUP BY product_category
ORDER BY total DESC;

-- QUALIFY (BigQuery, Snowflake) : filtrer les window functions
SELECT *
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) = 1;

3CTEs recursives : hierarchies et graphes

Question discriminante

Comment requetez-vous une hierarchie organisationnelle avec SQL ? Ecrivez la requete pour retrouver tous les subordonnes d un manager.

-- Traverser une hierarchie avec une CTE recursive
WITH RECURSIVE org_hierarchy AS (
    -- Cas de base : le manager racine
    SELECT employee_id, name, manager_id, 0 AS depth,
           CAST(name AS VARCHAR(500)) AS path
    FROM employees
    WHERE manager_id IS NULL  -- PDG
    
    UNION ALL
    
    -- Recursion : les subordonnes
    SELECT e.employee_id, e.name, e.manager_id,
           oh.depth + 1,
           oh.path || ' > ' || e.name
    FROM employees e
    INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
    WHERE oh.depth < 10  -- eviter les cycles infinis
)
SELECT * FROM org_hierarchy
WHERE depth <= 3  -- 3 niveaux sous le PDG
ORDER BY path;

4Optimisation : EXPLAIN ANALYZE et strategies

Question discriminante

Votre requete prend 45 secondes. Comment diagnostiquez-vous le probleme et par ou commencez-vous ?

-- PostgreSQL : plan d execution
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.id), SUM(o.amount)
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
  AND o.created_at > '2024-01-01'
GROUP BY c.name;

-- Ce qu on cherche dans le plan :
-- Seq Scan sur grande table -> index manquant
-- Hash Join avec grosse table de hash -> trop de memoire
-- Sort avec beaucoup de lignes -> index sur ORDER BY

Index sur WHERE et JOIN — premier reflexe : est-ce que les colonnes de filtre sont indexees ?
Statistics — ANALYZE pour mettre a jour les statistiques du planificateur
Partitionnement — sur les grandes tables avec des filtres temporels frequents
Materialized View — pour les requetes complexes executees frequemment sur des donnees qui changent peu souvent

5Materialized Views et incremental refresh

-- Materialized View pour les agregations couteuses
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
    DATE_TRUNC('day', order_date) AS day,
    product_category,
    SUM(amount) AS revenue,
    COUNT(*) AS nb_orders
FROM orders
GROUP BY 1, 2;

CREATE INDEX ON mv_daily_revenue (day);

-- Refresh periodique (ex : chaque nuit via cron)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;

6Manipulation JSON : events et schemas flexibles

-- PostgreSQL / BigQuery : extraire des champs JSON
SELECT
    event_type,
    event_properties->>'product_id' AS product_id,
    (event_properties->>'price')::FLOAT AS price,
    event_properties->'tags' AS tags_array
FROM events
WHERE event_properties->>'event_name' = 'purchase';

-- Exploser un tableau JSON en lignes
SELECT user_id, item
FROM events,
JSONB_ARRAY_ELEMENTS_TEXT(event_properties->'items') AS item
WHERE event_type = 'cart_add';

-- Analyse de cohortes de retention mensuelle
WITH cohorts AS (
    SELECT customer_id, DATE_TRUNC('month', first_order_date) AS cohort_month
    FROM dim_customers
),
retention AS (
    SELECT c.cohort_month,
        DATEDIFF('month', c.cohort_month, DATE_TRUNC('month', o.order_date)) AS month_n,
        COUNT(DISTINCT o.customer_id) AS active_customers
    FROM cohorts c JOIN fct_orders o ON c.customer_id = o.customer_id
    GROUP BY 1, 2
),
sizes AS (SELECT cohort_month, COUNT(*) AS cohort_size FROM cohorts GROUP BY 1)
SELECT r.cohort_month, r.month_n,
    ROUND(r.active_customers * 100.0 / s.cohort_size, 1) AS retention_pct
FROM retention r JOIN sizes s USING (cohort_month)
ORDER BY cohort_month, month_n;

-- Mediane et percentiles
SELECT region,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS mediane,
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY amount) AS p90
FROM fct_orders GROUP BY region;

Analyse de cohortes - technique fondamentale pour mesurer la retention par periode d acquisition. Double CTE : cohorts (premier achat) + retention (activite par mois)
QUALIFY - filtre sur window functions (Snowflake, BigQuery, DuckDB). Evite un CTE supplementaire apres RANK() ou ROW_NUMBER()
LATERAL JOIN - joindre avec une fonction table-valued. Exemple : exploser un tableau JSON colonne par colonne
Recursive CTEs - parcourir des structures hierarchiques (organigramme, categories imbriquees) en SQL pur

7Grille par niveau

Niveau	Maitrise	Signal GO	NO-GO
Junior	JOINs, GROUP BY, sous-requetes, CTEs simples	Ecrit des CTEs lisibles, sait quand utiliser HAVING vs WHERE	Confond WHERE et HAVING, ne sait pas ce qu est une CTE
Confirme	Window functions, QUALIFY, pivots, agregations conditionnelles	Ecrit une analyse de cohorte avec ROW_NUMBER, utilise LAG/LEAD	Ne connait pas les window functions, ne sait pas faire un PIVOT
Senior	CTEs recursives, EXPLAIN ANALYZE, Materialized Views, JSON	A debug une requete lente avec EXPLAIN, a ecrit une CTE recursive	Ne sait pas lire un plan EXPLAIN, n a jamais utilise de Materialized View

In Senior SQL interviews, simple SELECT is rarely tested. What's tested is the ability to solve complex analytical problems: cohorts, hierarchies, sessions, rolling rankings. Here are the patterns that are really evaluated.

1Window functions: advanced patterns

Discriminating question

What is the difference between ROW_NUMBER(), RANK() and DENSE_RANK()? Give a case where each one is the right choice.

-- Cohort analysis: retention by week
WITH first_purchase AS (
    SELECT user_id,
           DATE_TRUNC('week', MIN(order_date)) AS cohort_week
    FROM orders
    GROUP BY user_id
),
weekly_activity AS (
    SELECT fp.cohort_week,
           DATE_TRUNC('week', o.order_date) AS activity_week,
           COUNT(DISTINCT o.user_id) AS users
    FROM first_purchase fp
    JOIN orders o ON fp.user_id = o.user_id
    GROUP BY 1, 2
)
SELECT
    cohort_week,
    activity_week,
    DATEDIFF('week', cohort_week, activity_week) AS week_number,
    users,
    FIRST_VALUE(users) OVER (PARTITION BY cohort_week ORDER BY activity_week) AS cohort_size,
    users * 100.0 / FIRST_VALUE(users) OVER (PARTITION BY cohort_week ORDER BY activity_week) AS retention_pct
FROM weekly_activity
ORDER BY cohort_week, week_number;

FRAME CLAUSE — ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for cumulative totals, ROWS BETWEEN 6 PRECEDING AND CURRENT ROW for 7-day rolling averages
LAG/LEAD — access previous/next rows within a partition. Essential for growth calculations
FIRST_VALUE/LAST_VALUE — first/last value of a partition. Useful for cohorts

2PIVOT and conditional aggregations

-- Dynamic PIVOT with conditional aggregation
-- Sales by quarter as columns
SELECT
    product_category,
    SUM(CASE WHEN QUARTER(order_date) = 1 THEN amount ELSE 0 END) AS Q1,
    SUM(CASE WHEN QUARTER(order_date) = 2 THEN amount ELSE 0 END) AS Q2,
    SUM(CASE WHEN QUARTER(order_date) = 3 THEN amount ELSE 0 END) AS Q3,
    SUM(CASE WHEN QUARTER(order_date) = 4 THEN amount ELSE 0 END) AS Q4,
    SUM(amount) AS total
FROM orders
GROUP BY product_category
ORDER BY total DESC;

-- QUALIFY (BigQuery, Snowflake) : filter window functions
SELECT *
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) = 1;

3Recursive CTEs: hierarchies and graphs

Discriminating question

How do you query an organizational hierarchy with SQL? Write the query to retrieve all subordinates of a manager.

-- Traverse a hierarchy with a recursive CTE
WITH RECURSIVE org_hierarchy AS (
    -- Base case: the root manager
    SELECT employee_id, name, manager_id, 0 AS depth,
           CAST(name AS VARCHAR(500)) AS path
    FROM employees
    WHERE manager_id IS NULL  -- CEO
    
    UNION ALL
    
    -- Recursion: the subordinates
    SELECT e.employee_id, e.name, e.manager_id,
           oh.depth + 1,
           oh.path || ' > ' || e.name
    FROM employees e
    INNER JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
    WHERE oh.depth < 10  -- avoid infinite loops
)
SELECT * FROM org_hierarchy
WHERE depth <= 3  -- 3 levels below the CEO
ORDER BY path;

4Optimization: EXPLAIN ANALYZE and strategies

Discriminating question

Your query takes 45 seconds. How do you diagnose the problem and where do you start?

-- PostgreSQL: execution plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.id), SUM(o.amount)
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
  AND o.created_at > '2024-01-01'
GROUP BY c.name;

-- What to look for in the plan:
-- Seq Scan on large table -> missing index
-- Hash Join with large hash table -> too much memory
-- Sort with many rows -> index on ORDER BY

Index on WHERE and JOIN — first reflex: are the filter columns indexed?
Statistics — ANALYZE to update planner statistics
Partitioning — on large tables with frequent time-based filters
Materialized View — for complex queries run frequently on data that changes infrequently

5Materialized Views and incremental refresh

-- Materialized View for costly aggregations
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
    DATE_TRUNC('day', order_date) AS day,
    product_category,
    SUM(amount) AS revenue,
    COUNT(*) AS nb_orders
FROM orders
GROUP BY 1, 2;

CREATE INDEX ON mv_daily_revenue (day);

-- Periodic refresh (e.g.: every night via cron)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;

6JSON manipulation: events and flexible schemas

-- PostgreSQL / BigQuery: extract JSON fields
SELECT
    event_type,
    event_properties->>'product_id' AS product_id,
    (event_properties->>'price')::FLOAT AS price,
    event_properties->'tags' AS tags_array
FROM events
WHERE event_properties->>'event_name' = 'purchase';

-- Explode a JSON array into rows
SELECT user_id, item
FROM events,
JSONB_ARRAY_ELEMENTS_TEXT(event_properties->'items') AS item
WHERE event_type = 'cart_add';

-- Monthly retention cohort analysis
WITH cohorts AS (
    SELECT customer_id, DATE_TRUNC('month', first_order_date) AS cohort_month
    FROM dim_customers
),
retention AS (
    SELECT c.cohort_month,
        DATEDIFF('month', c.cohort_month, DATE_TRUNC('month', o.order_date)) AS month_n,
        COUNT(DISTINCT o.customer_id) AS active_customers
    FROM cohorts c JOIN fct_orders o ON c.customer_id = o.customer_id
    GROUP BY 1, 2
),
sizes AS (SELECT cohort_month, COUNT(*) AS cohort_size FROM cohorts GROUP BY 1)
SELECT r.cohort_month, r.month_n,
    ROUND(r.active_customers * 100.0 / s.cohort_size, 1) AS retention_pct
FROM retention r JOIN sizes s USING (cohort_month)
ORDER BY cohort_month, month_n;

-- Median and percentiles
SELECT region,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amount) AS median,
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY amount) AS p90
FROM fct_orders GROUP BY region;

Cohort analysis - fundamental technique for measuring retention by acquisition period. Double CTE: cohorts (first purchase) + retention (activity by month)
QUALIFY - filter on window functions (Snowflake, BigQuery, DuckDB). Avoids an extra CTE after RANK() or ROW_NUMBER()
LATERAL JOIN - join with a table-valued function. Example: explode a JSON array column by column
Recursive CTEs - traverse hierarchical structures (org charts, nested categories) in pure SQL

7Level grid

Level	Mastery	GO Signal	NO-GO
Junior	JOINs, GROUP BY, subqueries, simple CTEs	Writes readable CTEs, knows when to use HAVING vs WHERE	Confuses WHERE and HAVING, doesn't know what a CTE is
Mid-level	Window functions, QUALIFY, pivots, conditional aggregations	Writes a cohort analysis with ROW_NUMBER, uses LAG/LEAD	Doesn't know window functions, can't do a PIVOT
Senior	Recursive CTEs, EXPLAIN ANALYZE, Materialized Views, JSON	Has debugged a slow query with EXPLAIN, has written a recursive CTE	Can't read an EXPLAIN plan, has never used a Materialized View

Vous recrutez un Data Analyst ou Analytics Engineer ?

Premier entretien gratuit. Rapport GO/NO-GO sous 48h.

Tester gratuitement Reserver un appel