Infrastructure
Excellent — let’s lock this down as a complete, real-world, enterprise-grade Data Engineering Platform with DEV / PPD / PROD, built using Terraform + GitOps, and covering all required services for ingestion, transformation, orchestration, and loading.
Below is a clean, scalable blueprint you can directly implement and confidently explain in interviews.
1️⃣ Target Architecture (Big Picture)
Environments
DEV – feature development & experimentation
PPD (Pre-Prod) – integration, load testing, validation
PROD – production workloads
Each environment is:
- Separate AWS account
- Separate Terraform state
- Separate PKI / Secrets
- Same Terraform modules (DRY)
2️⃣ Services Required (Data Engineering)
| Layer | Service | Purpose |
|---|---|---|
| Ingestion | Apache NiFi | Streaming + batch ingestion |
| Orchestration | Apache Airflow | Workflow orchestration |
| Processing | Spark (EMR / Databricks-ready) | Transformations |
| Storage | S3 (Bronze / Silver / Gold) | Data lake |
| Metadata | Glue Data Catalog | Table metadata |
| Security | ACM Private CA | mTLS & user auth |
| Secrets | AWS Secrets Manager | Keystores, DB creds |
| Networking | VPC, ALB, ASG | HA & scalability |
| Observability | CloudWatch + ALB logs | Monitoring |
| CI/CD | GitHub Actions | GitOps infra + certs |
3️⃣ Final Repo Structure (Full)
terraform-aws-infra/
├── modules/ # Reusable building blocks
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── versions.tf
│ │ └── README.md
│ ├── s3-datalake/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── lifecycle.tf
│ │ └── README.md
│ ├── pki-acm-ca/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── policy.tf
│ │ └── README.md
│ ├── nifi/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── asg.tf
│ │ ├── alb.tf
│ │ ├── user-data.tpl
│ │ └── README.md
│ ├── airflow/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── rds.tf
│ │ ├── dags_sync.tf
│ │ ├── ecs_or_ec2.tf
│ │ └── README.md
│ ├── iam/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── policies.tf
│ │ └── README.md
│ └── monitoring/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── alarms.tf
│ ├── dashboards.tf
│ └── README.md
├── environments/ # Per-account/per-env stacks
│ ├── dev/
│ │ ├── provider.tf
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ ├── versions.tf
│ │ └── outputs.tf
│ ├── ppd/
│ │ ├── provider.tf
│ │ ├── backend.tf
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── terraform.tfvars
│ │ ├── versions.tf
│ │ └── outputs.tf
│ └── prod/
│ ├── provider.tf
│ ├── backend.tf
│ ├── main.tf
│ ├── variables.tf
│ ├── terraform.tfvars
│ ├── versions.tf
│ └── outputs.tf
├── cert-requests/ # GitOps user onboarding
│ ├── dev/
│ │ ├── alice.yaml
│ │ └── bob.yaml
│ ├── ppd/
│ │ └── ci-user.yaml
│ └── prod/
│ └── ops-user.yaml
└── .github/workflows/
├── terraform.yml
├── cert-issuer.yml
├── promote.yml
└── security-scan.yml
4️⃣ Environment Flow (DEV → PPD → PROD)
Git Strategy
| Action | Result |
|---|---|
feature/* branch |
Deploys to DEV |
release/* branch |
Deploys to PPD |
vX.Y.Z tag |
Deploys to PROD |
Terraform Deployment Workflow
on:
push:
branches:
- feature/*
- release/*
tags:
- v*
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Select environment
run: |
if [[ "${GITHUB_REF}" == refs/heads/feature/* ]]; then
echo "ENV=dev" >> $GITHUB_ENV
elif [[ "${GITHUB_REF}" == refs/heads/release/* ]]; then
echo "ENV=ppd" >> $GITHUB_ENV
else
echo "ENV=prod" >> $GITHUB_ENV
fi
- name: Terraform Apply
run: |
cd environments/$ENV
terraform init
terraform apply -auto-approve
5️⃣ Networking Module (Shared)
modules/networking
- VPC
- Public + Private subnets
- NAT Gateway
- ALB
- Security Groups
Used identically across dev / ppd / prod.
6️⃣ Data Lake (S3)
modules/s3-datalake
resource "aws_s3_bucket" "bronze" {
bucket = "dl-${var.env}-bronze"
}
resource "aws_s3_bucket" "silver" {
bucket = "dl-${var.env}-silver"
}
resource "aws_s3_bucket" "gold" {
bucket = "dl-${var.env}-gold"
}
7️⃣ NiFi (EC2 + ASG + ALB)
Architecture
- NiFi nodes in private subnet
- ALB in public subnet
- mTLS enforced
- Auto Scaling enabled
Certificate handling
- Truststore → S3
- Keystore → Secrets Manager
NiFi user data (simplified)
aws s3 cp s3://nifi-${ENV}-certs/truststore.jks /opt/nifi/conf/
aws secretsmanager get-secret-value \
--secret-id nifi-keystore-${ENV} \
--query SecretString \
--output text > keystore.p12
8️⃣ Airflow (Latest Version)
Architecture
- EC2 or ECS (start EC2 for learning)
- Metadata DB → RDS Postgres
- DAGs → S3
- IAM role for NiFi trigger
9️⃣ Certificate Management (Critical)
Central CA (per environment)
| Environment | CA |
|---|---|
| dev | dev-acm-pca |
| ppd | ppd-acm-pca |
| prod | prod-acm-pca |
User cert via Git
cert-requests/dev/alice.yaml
user: alice
env: dev
subject:
common_name: alice
organizational_unit: DataEng
organization: Platform
expiry_days: 180
Flow
- PR approved
- Cert issued
- Stored in Secrets Manager
- User imports into browser
🔐 How Users Login (NiFi)
- User opens NiFi URL
- Browser presents client certificate
- NiFi validates against truststore
- DN mapped to role
- Access granted (no passwords, no IAM users)
🔄 DEV → PPD → PROD Promotion
Promotion rules
- Infra identical
- Only instance size, scaling, retention, approvals differ
Promote using tag-based deployment with manual approval for PROD.
🔍 Monitoring & Ops
- CloudWatch logs
- ALB access logs
- Auto Scaling alarms
- NiFi provenance tuning
- Airflow SLA alerts
🎯 Why This Is Real-World & Interview-Ready
- Multi-account AWS
- GitOps infra + certs
- Secure mTLS auth
- Scalable ingestion
- Production-grade orchestration
- Clean promotion strategy
🧭 Recommended Implementation Order
- Networking + S3
- ACM PCA module
- NiFi single node (DEV)
- Cert GitOps flow
- Airflow integration
- Scale to ASG
- Add PPD & PROD