Metabase, Streamlit et Redash repondent a un besoin croissant : des dashboards rapides a deployer, sans passer par un Data Analyst pour chaque requete. Savoir positionner ces outils — et en assurer la gouvernance — est un vrai differenciateur en entretien.
Les outils BI self-service permettent aux equipes metier d'explorer les donnees sans solliciter un Data Analyst a chaque requete. En entretien, on evalue la capacite a choisir le bon outil selon le contexte — et a mettre en place la gouvernance qui evite le chaos.
Question discriminante
Quelle est la difference entre les questions visuelles et les questions SQL natives dans Metabase ? Pourquoi ne pas donner l acces SQL natif a tous les utilisateurs ?
- Question Builder (visuel) — interface sans SQL : l'utilisateur choisit une table, des filtres, des groupements. Adapte aux profils non techniques
- Questions SQL natives — SQL direct dans Metabase. Puissant mais a restreindre : les requetes non optimisees peuvent saturer la base
- Modeles — requetes SQL preecrites par l'equipe data, exposees comme des tables virtuelles pour les utilisateurs metier
- Connexions supportees — PostgreSQL, MySQL, BigQuery, Snowflake, Redshift, MongoDB, DuckDB, SQLite et plus
- Exploration automatique — Metabase genere des explorations automatiques pour chaque table (distributions, tendances, valeurs nulles)
Bonne pratique : ne pas donner l'acces SQL natif a tout le monde. Creer des Modeles pour exposer des vues preoptimisees. Pointer vers un replica de lecture, jamais la base de production transactionnelle.
2Streamlit : dashboards Python interactifs
Question discriminante
Dans quel cas choisissez-vous Streamlit plutot que Metabase ? Quelle est la fonction st.cache_data et pourquoi est-elle critique ?
import streamlit as st
import pandas as pd
import plotly.express as px
from sqlalchemy import create_engine
# Cache critique : evite de recharger les donnees a chaque interaction
@st.cache_data(ttl=3600)
def load_data(start_date, end_date):
engine = create_engine("postgresql://...")
query = f"""
SELECT region, month, SUM(revenue) as revenue
FROM sales
WHERE date BETWEEN '{start_date}' AND '{end_date}'
GROUP BY region, month
"""
return pd.read_sql(query, engine)
st.title("Dashboard Ventes")
# Filtres interactifs dans la sidebar
with st.sidebar:
regions = st.multiselect("Regions", ['FR','DE','UK','ES'])
date_range = st.date_input("Periode", value=(start, end))
df = load_data(*date_range)
if regions:
df = df[df['region'].isin(regions)]
# KPIs en colonnes
c1, c2, c3 = st.columns(3)
c1.metric("CA Total", f"{df.revenue.sum():,.0f} EUR", delta="+8.3%")
c2.metric("Regions actives", df.region.nunique())
c3.metric("Meilleur mois", df.groupby('month').revenue.sum().idxmax())
# Graphique interactif
fig = px.bar(df.groupby('region').revenue.sum().reset_index(),
x='region', y='revenue')
st.plotly_chart(fig, use_container_width=True)
- st.cache_data — met en cache le resultat de la fonction. Sans ca, la requete SQL se re-execute a chaque interaction utilisateur
- Avantages — full Python, logique custom (ML, calculs complexes), controle total sur l'UX
- Ideal pour — dashboards ML (predictions temps reel), outils internes avec logique metier specifique, prototypes rapides
- Deploiement — Streamlit Cloud, Docker sur GKE/ECS, ou sur VM avec nginx
3Redash : SQL collaboratif vers dashboard
Question discriminante
Quelle est la valeur de Redash par rapport a des exports SQL manuels sur Excel ?
- Requetes partagees — les requetes ecrites par l'equipe data sont reutilisables et decouvertes par toute l'organisation
- Scheduling — actualisation automatique des dashboards a une frequence definie (toutes les heures, chaque nuit)
- Alertes — notification automatique quand une metrique depasse un seuil defini
- Parametres — les utilisateurs peuvent filtrer les dashboards via des parametres sans toucher au SQL
- Ideal pour — equipes techniques qui veulent explorer avec SQL sans deployer un outil lourd
4Comparaison des outils
| Metabase | Streamlit | Redash | Power BI / Tableau |
| Cible | Utilisateurs metier | Data Engineers/Scientists | Equipes tech SQL | Data Analysts / BI |
| Courbe | Tres faible | Moyenne (Python) | Faible (SQL) | Moyenne a elevee |
| Custom | Limitee | Maximale | Moderee | Elevee |
| Cout | Gratuit (OSS) | Gratuit (OSS) | Gratuit (OSS) | Licences par user |
| Self-service | Excellent | Non (code requis) | Bon (SQL requis) | Bon |
| ML integre | Non | Oui (Python natif) | Non | Partiel |
5Quel outil choisir selon le contexte ?
- Equipes metier non techniques qui veulent explorer les donnees → Metabase. Interface visuelle, pas de SQL requis
- Dashboard ML avec predictions temps reel ou logique custom → Streamlit. Python natif, logique metier complexe possible
- Equipe technique qui veut partager des requetes SQL → Redash. SQL collaboratif, scheduling simple
- Reporting corporate avec gouvernance, RLS, deployment pipeline → Power BI ou Tableau
- Prototype ou POC rapide → Metabase ou Streamlit selon si l'equipe est SQL ou Python
6Gouvernance et pieges a eviter
Question discriminante
Quels sont les risques d'un deploiement Metabase sans gouvernance ? Comment les evitez-vous ?
- Requetes non optimisees sur la prod — des utilisateurs peuvent lancer des SQL qui saturent la base de production. Toujours pointer vers un read replica ou un data warehouse
- Proliferation de dashboards obsoletes — sans naming convention et archivage, Metabase devient rapidement un cimetiere. Definir un process de deprecation
- Metrique non alignee — si deux dashboards calculent le chiffre d'affaires differemment, la confiance dans les donnees s'effondre. Centraliser les definitions dans des Modeles
- Ruptures silencieuses — si les tables sources changent, les dashboards Metabase peuvent casser sans alerte. Mettre en place des tests de qualite (dbt tests)
7Grille par niveau
| Niveau | Maitrise attendue | Signal GO | NO-GO |
| Junior | Utilisation de Metabase ou Streamlit, connexion a une base | A cree un dashboard Metabase, sait connecter une source de donnees | Ne connait aucun outil self-service, ne sait pas quand utiliser Metabase vs Power BI |
| Confirme | Gouvernance Metabase, Streamlit avec cache, positionnement des outils | A deploye Metabase en prod avec read replica et modeles, utilise st.cache_data | A deploye Metabase sans read replica, donne l'acces SQL natif a tous |
| Senior | Architecture self-service complete, choix d'outil justifie, monitoring | A architecture un stack self-service (Metabase + dbt + warehouse), justifie ses choix | Ne peut pas expliquer quand utiliser Metabase plutot que Power BI |
Self-service BI tools allow business teams to explore data without requesting a Data Analyst for every query. In interviews, the ability to choose the right tool based on context is assessed — as well as the ability to implement governance that prevents chaos.
Discriminating question
What is the difference between visual questions and native SQL questions in Metabase? Why not give native SQL access to all users?
- Question Builder (visual) — SQL-free interface: the user selects a table, filters, and groupings. Suitable for non-technical profiles
- Native SQL questions — Direct SQL in Metabase. Powerful but should be restricted: unoptimized queries can overload the database
- Models — SQL queries pre-written by the data team, exposed as virtual tables for business users
- Supported connections — PostgreSQL, MySQL, BigQuery, Snowflake, Redshift, MongoDB, DuckDB, SQLite and more
- Automatic exploration — Metabase generates automatic explorations for each table (distributions, trends, null values)
Best practice: do not give native SQL access to everyone. Create Models to expose pre-optimized views. Point to a read replica, never the transactional production database.
2Streamlit: interactive Python dashboards
Discriminating question
In which case do you choose Streamlit over Metabase? What is the st.cache_data function and why is it critical?
import streamlit as st
import pandas as pd
import plotly.express as px
from sqlalchemy import create_engine
# Critical cache: avoids reloading data on every interaction
@st.cache_data(ttl=3600)
def load_data(start_date, end_date):
engine = create_engine("postgresql://...")
query = f"""
SELECT region, month, SUM(revenue) as revenue
FROM sales
WHERE date BETWEEN '{start_date}' AND '{end_date}'
GROUP BY region, month
"""
return pd.read_sql(query, engine)
st.title("Sales Dashboard")
# Interactive filters in the sidebar
with st.sidebar:
regions = st.multiselect("Regions", ['FR','DE','UK','ES'])
date_range = st.date_input("Period", value=(start, end))
df = load_data(*date_range)
if regions:
df = df[df['region'].isin(regions)]
# KPIs in columns
c1, c2, c3 = st.columns(3)
c1.metric("Total Revenue", f"{df.revenue.sum():,.0f} EUR", delta="+8.3%")
c2.metric("Active regions", df.region.nunique())
c3.metric("Best month", df.groupby('month').revenue.sum().idxmax())
# Interactive chart
fig = px.bar(df.groupby('region').revenue.sum().reset_index(),
x='region', y='revenue')
st.plotly_chart(fig, use_container_width=True)
- st.cache_data — caches the function result. Without it, the SQL query re-executes on every user interaction
- Advantages — full Python, custom logic (ML, complex calculations), total control over UX
- Ideal for — ML dashboards (real-time predictions), internal tools with specific business logic, rapid prototypes
- Deployment — Streamlit Cloud, Docker on GKE/ECS, or on VM with nginx
3Redash: collaborative SQL to dashboard
Discriminating question
What is the value of Redash compared to manual SQL exports to Excel?
- Shared queries — queries written by the data team are reusable and discoverable by the entire organization
- Scheduling — automatic dashboard refresh at a defined frequency (every hour, every night)
- Alerts — automatic notification when a metric exceeds a defined threshold
- Parameters — users can filter dashboards via parameters without touching the SQL
- Ideal for — technical teams who want to explore with SQL without deploying a heavy tool
4Tool comparison
| Metabase | Streamlit | Redash | Power BI / Tableau |
| Target | Business users | Data Engineers/Scientists | SQL tech teams | Data Analysts / BI |
| Learning curve | Very low | Medium (Python) | Low (SQL) | Medium to high |
| Custom | Limited | Maximum | Moderate | High |
| Cost | Free (OSS) | Free (OSS) | Free (OSS) | Per-user licenses |
| Self-service | Excellent | No (code required) | Good (SQL required) | Good |
| Built-in ML | No | Yes (native Python) | No | Partial |
5Which tool to choose based on context?
- Non-technical business teams who want to explore data → Metabase. Visual interface, no SQL required
- ML dashboard with real-time predictions or custom logic → Streamlit. Native Python, complex business logic possible
- Technical team that wants to share SQL queries → Redash. Collaborative SQL, simple scheduling
- Corporate reporting with governance, RLS, deployment pipeline → Power BI or Tableau
- Rapid prototype or POC → Metabase or Streamlit depending on whether the team uses SQL or Python
6Governance and pitfalls to avoid
Discriminating question
What are the risks of a Metabase deployment without governance? How do you avoid them?
- Unoptimized queries on production — users can run SQL queries that overload the production database. Always point to a read replica or a data warehouse
- Proliferation of obsolete dashboards — without naming conventions and archiving, Metabase quickly becomes a graveyard. Define a deprecation process
- Misaligned metrics — if two dashboards calculate revenue differently, trust in the data collapses. Centralize definitions in Models
- Silent breakages — if source tables change, Metabase dashboards can break without alerts. Implement quality tests (dbt tests)
7Level grid
| Level | Expected proficiency | GO signal | NO-GO |
| Junior | Using Metabase or Streamlit, connecting to a database | Has created a Metabase dashboard, knows how to connect a data source | Does not know any self-service tool, does not know when to use Metabase vs Power BI |
| Mid-level | Metabase governance, Streamlit with cache, tool positioning | Has deployed Metabase in production with read replica and models, uses st.cache_data | Has deployed Metabase without read replica, gives native SQL access to everyone |
| Senior | Complete self-service architecture, justified tool choice, monitoring | Has architected a self-service stack (Metabase + dbt + warehouse), justifies choices | Cannot explain when to use Metabase rather than Power BI |