Terraform est devenu incontournable pour les Data Engineers qui provisionnent des ressources cloud. En entretien, on evalue la capacite a ecrire des modules reutilisables et a gerer le state en equipe.
1Concepts fondamentaux Terraform
Question discriminante
Quelle est la difference entre terraform plan et terraform apply ? Et entre un resource et un data source ?
# Structure d un projet Terraform data
.
├── main.tf # ressources principales
├── variables.tf # inputs
├── outputs.tf # outputs
├── versions.tf # versions des providers
└── modules/
├── bigquery/
│ ├── main.tf
│ └── variables.tf
└── gcs/
└── main.tf
# versions.tf
terraform {
required_version = '>= 1.5'
required_providers {
google = {
source = 'hashicorp/google'
version = '~> 5.0'
}
}
}
- terraform plan — montre ce qui va changer sans rien modifier. A toujours faire avant apply
- terraform apply — applique les changements. Demande confirmation
- resource — cree ou gere une ressource cloud
- data source — lit une ressource existante sans la creer ni la modifier
2Modules : encapsuler l infrastructure data
Question discriminante
Comment creez-vous un module Terraform reutilisable pour un dataset BigQuery ?
# modules/bigquery/main.tf
resource 'google_bigquery_dataset' 'this' {
dataset_id = var.dataset_id
project = var.project_id
location = var.location
description = var.description
labels = var.labels
dynamic 'access' {
for_each = var.access_roles
content {
role = access.value.role
user_by_email = access.value.email
}
}
delete_contents_on_destroy = var.delete_on_destroy
}
# Appel du module
module 'dataset_analytics' {
source = './modules/bigquery'
dataset_id = 'analytics_prod'
project_id = var.project_id
location = 'EU'
access_roles = [
{ role = 'READER', email = 'analysts@company.com' }
]
}
3Gestion du state en equipe
Question discriminante
Pourquoi ne faut-il jamais stocker le state Terraform localement en equipe ? Comment le gerez-vous ?
# backend.tf - remote state dans GCS
terraform {
backend 'gcs' {
bucket = 'mon-projet-tfstate'
prefix = 'terraform/data-platform'
}
}
# Locking automatique via Cloud Storage
# Empêche deux apply simultanes
# Workspaces : environnements separes
terraform workspace new staging
terraform workspace select production
- Remote backend obligatoire — GCS, S3, Azure Blob, Terraform Cloud. Jamais .tfstate en local en equipe
- State locking — empeche deux apply simultanes qui corrompraient l etat
- terraform.tfvars — variables par environnement. Ne jamais commiter les secrets
4CI/CD Terraform avec GitHub Actions
Question discriminante
Comment integrez-vous Terraform dans un pipeline CI/CD ?
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
paths: ['infra/**']
push:
branches: [main]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
working-directory: infra/
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Terraform Apply # uniquement sur main
if: github.ref == 'refs/heads/main'
run: terraform apply tfplan
5Resources data courantes en Terraform
- GCS buckets — google_storage_bucket avec lifecycle rules, versioning, CMEK
- BigQuery — google_bigquery_dataset, google_bigquery_table avec schema JSON
- Airflow (Cloud Composer) — google_composer_environment avec variables DAG
- IAM — google_project_iam_member, google_storage_bucket_iam_binding
- Secret Manager — google_secret_manager_secret pour les credentials
6Import de ressources existantes
Question discriminante
Comment gerez-vous l infrastructure existante qui n a pas ete creee avec Terraform ?
# terraform import : prendre le controle d une ressource existante
terraform import google_bigquery_dataset.analytics \
projects/mon-projet/datasets/analytics
# Terraform 1.5+ : import block dans le code
import {
to = google_bigquery_dataset.analytics
id = 'projects/mon-projet/datasets/analytics'
}
# terraform plan -generate-config-out=generated.tf
# Genere automatiquement le code Terraform pour les ressources importees
module "analytics_dataset" {
source = "./modules/bigquery_dataset"
dataset_id = "analytics_prod"
project_id = var.project_id
location = "EU"
access_roles = [
{ role = "READER", email = "analysts@company.com" }
]
}
terraform {
backend "gcs" {
bucket = "mon-projet-tfstate"
prefix = "terraform/data-platform"
}
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
}
}
# Import de ressources existantes
# terraform import google_bigquery_dataset.analytics projects/mon-projet/datasets/analytics
# Workspaces pour les environnements
# terraform workspace new staging
# terraform workspace select production
- Remote backend obligatoire - GCS, S3, Terraform Cloud. Jamais .tfstate en local en equipe : risque de conflits et perte d etat
- State locking - GCS et S3 verrouillent le state pendant apply. Empeche deux apply simultanes qui corrompraient l etat
- Import de ressources - prendre le controle de ressources creees manuellement. terraform import + generer le code avec -generate-config-out
- Workspaces - gerer plusieurs environnements (dev/staging/prod) avec le meme code. Variables differentes par workspace
- CI/CD Terraform - plan automatique sur PR (commenter le plan dans GitHub), apply uniquement sur merge main. Secrets AWS/GCP dans GitHub Actions
- Remote backend obligatoire - GCS, S3, ou Terraform Cloud. Jamais .tfstate en local en equipe : risque de conflits et perte d etat
- State locking - GCS et S3 verrouillent le state pendant apply. Empeche deux apply simultanes qui corrompraient l etat
- Import de ressources - terraform import pour prendre le controle de ressources creees manuellement. Generer le code avec -generate-config-out
- Workspaces - gerer plusieurs environnements (dev/staging/prod) avec le meme code. Variables differentes par workspace
- CI/CD Terraform - plan automatique sur PR, apply uniquement sur merge main. Secrets AWS/GCP dans GitHub Actions, jamais dans le code
7Grille par niveau
| Niveau | Maitrise | Signal GO | NO-GO |
|---|
| Junior | Syntaxe HCL, plan/apply, ressources basiques | Cree un bucket GCS et un dataset BigQuery, sait faire un plan | Ne sait pas ce qu est le state Terraform |
| Confirme | Modules, remote backend, variables et outputs | A cree un module reutilisable, configure un remote backend GCS/S3 | Stocke le state en local, ne sait pas creer un module |
| Senior | CI/CD, import, workspaces, gestion des secrets | A integre Terraform dans GitHub Actions, a fait un terraform import | Ne sait pas ce qu est le state locking |
1Terraform Core Concepts
Discriminating question
What is the difference between terraform plan and terraform apply? And between a resource and a data source?
# Structure d un projet Terraform data
.
├── main.tf # main resources
├── variables.tf # inputs
├── outputs.tf # outputs
├── versions.tf # provider versions
└── modules/
├── bigquery/
│ ├── main.tf
│ └── variables.tf
└── gcs/
└── main.tf
# versions.tf
terraform {
required_version = '>= 1.5'
required_providers {
google = {
source = 'hashicorp/google'
version = '~> 5.0'
}
}
}
- terraform plan — shows what will change without modifying anything. Always run before apply
- terraform apply — applies the changes. Requires confirmation
- resource — creates or manages a cloud resource
- data source — reads an existing resource without creating or modifying it
2Modules: encapsulating data infrastructure
Discriminating question
How do you create a reusable Terraform module for a BigQuery dataset?
# modules/bigquery/main.tf
resource 'google_bigquery_dataset' 'this' {
dataset_id = var.dataset_id
project = var.project_id
location = var.location
description = var.description
labels = var.labels
dynamic 'access' {
for_each = var.access_roles
content {
role = access.value.role
user_by_email = access.value.email
}
}
delete_contents_on_destroy = var.delete_on_destroy
}
# Module call
module 'dataset_analytics' {
source = './modules/bigquery'
dataset_id = 'analytics_prod'
project_id = var.project_id
location = 'EU'
access_roles = [
{ role = 'READER', email = 'analysts@company.com' }
]
}
3State management in a team
Discriminating question
Why should you never store Terraform state locally in a team? How do you manage it?
# backend.tf - remote state in GCS
terraform {
backend 'gcs' {
bucket = 'mon-projet-tfstate'
prefix = 'terraform/data-platform'
}
}
# Automatic locking via Cloud Storage
# Prevents two simultaneous applies
# Workspaces: separate environments
terraform workspace new staging
terraform workspace select production
- Remote backend required — GCS, S3, Azure Blob, Terraform Cloud. Never .tfstate locally in a team
- State locking — prevents two simultaneous applies that would corrupt the state
- terraform.tfvars — variables per environment. Never commit secrets
4CI/CD Terraform with GitHub Actions
Discriminating question
How do you integrate Terraform into a CI/CD pipeline?
# .github/workflows/terraform.yml
name: Terraform
on:
pull_request:
paths: ['infra/**']
push:
branches: [main]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Terraform Init
run: terraform init
working-directory: infra/
- name: Terraform Plan
run: terraform plan -out=tfplan
- name: Terraform Apply # on main only
if: github.ref == 'refs/heads/main'
run: terraform apply tfplan
5Common data resources in Terraform
- GCS buckets — google_storage_bucket with lifecycle rules, versioning, CMEK
- BigQuery — google_bigquery_dataset, google_bigquery_table with JSON schema
- Airflow (Cloud Composer) — google_composer_environment with DAG variables
- IAM — google_project_iam_member, google_storage_bucket_iam_binding
- Secret Manager — google_secret_manager_secret for credentials
6Importing existing resources
Discriminating question
How do you manage existing infrastructure that was not created with Terraform?
# terraform import: take control of an existing resource
terraform import google_bigquery_dataset.analytics \
projects/mon-projet/datasets/analytics
# Terraform 1.5+: import block in code
import {
to = google_bigquery_dataset.analytics
id = 'projects/mon-projet/datasets/analytics'
}
# terraform plan -generate-config-out=generated.tf
# Automatically generates Terraform code for imported resources
module "analytics_dataset" {
source = "./modules/bigquery_dataset"
dataset_id = "analytics_prod"
project_id = var.project_id
location = "EU"
access_roles = [
{ role = "READER", email = "analysts@company.com" }
]
}
terraform {
backend "gcs" {
bucket = "mon-projet-tfstate"
prefix = "terraform/data-platform"
}
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
}
}
# Importing existing resources
# terraform import google_bigquery_dataset.analytics projects/mon-projet/datasets/analytics
# Workspaces for environments
# terraform workspace new staging
# terraform workspace select production
- Remote backend required - GCS, S3, Terraform Cloud. Never .tfstate locally in a team: risk of conflicts and state loss
- State locking - GCS and S3 lock the state during apply. Prevents two simultaneous applies that would corrupt the state
- Resource import - take control of manually created resources. terraform import + generate code with -generate-config-out
- Workspaces - manage multiple environments (dev/staging/prod) with the same code. Different variables per workspace
- CI/CD Terraform - automatic plan on PR (comment the plan in GitHub), apply only on main merge. AWS/GCP secrets in GitHub Actions
- Remote backend required - GCS, S3, or Terraform Cloud. Never .tfstate locally in a team: risk of conflicts and state loss
- State locking - GCS and S3 lock the state during apply. Prevents two simultaneous applies that would corrupt the state
- Resource import - terraform import to take control of manually created resources. Generate code with -generate-config-out
- Workspaces - manage multiple environments (dev/staging/prod) with the same code. Different variables per workspace
- CI/CD Terraform - automatic plan on PR, apply only on main merge. AWS/GCP secrets in GitHub Actions, never in code
7Level grid
| Level | Mastery | GO signal | NO-GO |
|---|
| Junior | HCL syntax, plan/apply, basic resources | Creates a GCS bucket and a BigQuery dataset, knows how to run a plan | Does not know what Terraform state is |
| Mid-level | Modules, remote backend, variables and outputs | Has created a reusable module, configured a remote backend GCS/S3 | Stores state locally, does not know how to create a module |
| Senior | CI/CD, import, workspaces, secrets management | Has integrated Terraform into GitHub Actions, has run a terraform import | Does not know what state locking is |