CDK Patterns for Multi-Environment Deployments
managing six environments from a single CDK application — one VPC, the STAGE env var pattern, shared vs isolated resource decisions, and why we chose simplicity over CDK Pipelines.
- DATE:
- JAN.08.2025
- READ:
- 16 MIN
Most CDK tutorials show you how to deploy one stack. In production, you need to deploy the same application across QA, Dev, Demo, POC, Mono, and Prod — each with different configurations, some sharing infrastructure, others fully isolated. This post documents the patterns we use to manage six environments from a single CDK application, all within one AWS account and one VPC.
The key design decisions: which resources are shared across stages, which are isolated, how the STAGE environment variable controls synthesis, and how VPC lookup replaces VPC duplication.
Architecture overview
All environments run in one AWS account and one region. A single VPC hosts all stages. Each stage gets its own ECS cluster, its own ECS services, and its own listener rules on the shared ALB. Aurora Serverless is shared across stages with separate database names.
Why one VPC? Running six VPCs in a single account wastes IP address space and money. NAT gateways cost $0.045/hour each (~$32/month per AZ per VPC). With one VPC and properly configured security groups, all stages communicate with shared resources — Aurora, Redis, ALB — without peering.
The trade-off is blast radius. A misconfigured security group rule could expose QA services to Prod traffic. We mitigate with strict per-stage security groups and separate ECS clusters, but for organizations with strict compliance requirements, separate accounts may be necessary.
The app.py structure
The CDK entry point uses the STAGE environment variable to control which stacks are synthesized:
#!/usr/bin/env python3
import os
from aws_cdk import App, Environment
app = App()
env = Environment(account="ACCOUNT_ID", region="us-east-1")
stage = os.getenv("STAGE", "").lower()
# Shared infrastructure — deployed once
if stage in ["shared", "all"]:
from stacks.shared_alb_stack import SharedAlbStack
from stacks.shared_db_stack import SharedDbStack
SharedAlbStack(app, "SharedAlbStack", env=env)
SharedDbStack(app, "SharedDbStack", env=env)
# QA — creates the VPC that all other stacks use
if stage in ["qa", "all"]:
from stacks.backend_stack_qa import BackendStackQA
from stacks.frontend_stack_qa import FrontendStackQA
BackendStackQA(app, "BackendStackQA", env=env)
FrontendStackQA(app, "FrontendStackQA", env=env)
# Dev
if stage in ["dev", "all"]:
from stacks.backend_stack_dev import BackendStackDev
BackendStackDev(app, "BackendStackDev", env=env)
# Demo
if stage in ["demo", "all"]:
from stacks.backend_stack_demo import BackendStackDemo
BackendStackDemo(app, "BackendStackDemo", env=env)
# POC
if stage in ["poc", "all"]:
from stacks.backend_stack_poc import BackendStackPoc
BackendStackPoc(app, "BackendStackPoc", env=env)
# Mono
if stage in ["mono", "all"]:
from stacks.backend_stack_mono import BackendStackMono
BackendStackMono(app, "BackendStackMono", env=env)
# Prod
if stage in ["prod", "all"]:
from stacks.backend_stack_prod import BackendStackProd
from stacks.frontend_stack_prod import FrontendStackProd
BackendStackProd(app, "BackendStackProd", env=env)
FrontendStackProd(app, "FrontendStackProd", env=env)
app.synth()Why not CDK Stage constructs or CDK Pipelines?
CDK has a Stage construct designed for multi-environment deployments. We chose the env var pattern for several reasons:
Synthesis speed. Synthesizing only the QA stacks takes 3 seconds. Synthesizing all stacks takes 20+ seconds because of VPC lookups and other context calls. In a development loop, this matters.
Simplicity. The env var pattern is instantly understandable by anyone who reads the code. No CDK-specific abstractions to learn.
CI/CD compatibility. Our GitHub Actions workflows set
STAGEas an environment variable. Each stage has its own workflow file, running independently.Incremental adoption. We started with a single QA stack and added stages one at a time, with zero refactoring.
CDK Pipelines is the right choice for organizations with separate AWS accounts per environment and strict approval workflows. For single-account setups, the env var pattern is simpler and faster.
VPC sharing: create once, look up everywhere
The QA stack creates the VPC. Every other stack looks it up by CloudFormation stack name tag:
QA stack: VPC creation
class BackendStackQA(Stack):
def __init__(self, scope, construct_id, **kwargs):
super().__init__(scope, construct_id, **kwargs)
# QA creates the VPC — this is the source of truth
self.vpc = ec2.Vpc(
self, "VpcQA",
max_azs=2,
nat_gateways=1,
subnet_configuration=[
ec2.SubnetConfiguration(
name="Public",
subnet_type=ec2.SubnetType.PUBLIC,
cidr_mask=24,
),
ec2.SubnetConfiguration(
name="Private",
subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,
cidr_mask=24,
),
ec2.SubnetConfiguration(
name="Isolated",
subnet_type=ec2.SubnetType.PRIVATE_ISOLATED,
cidr_mask=24,
),
],
)Other stacks: VPC lookup
class BackendStackDev(Stack):
def __init__(self, scope, construct_id, **kwargs):
super().__init__(scope, construct_id, **kwargs)
vpc = ec2.Vpc.from_lookup(
self, "VpcDev",
tags={"aws:cloudformation:stack-name": "BackendStackQA"},
)
cluster = ecs.Cluster(self, "ClusterDev", vpc=vpc)Vpc.from_lookup is a synthesis-time operation. CDK makes an API call to describe VPCs matching the filter criteria and caches the result in cdk.context.json. Commit this file to version control — CI/CD builds produce the same templates even if the AWS environment changes, and synthesis works without AWS credentials.
If the VPC changes (new subnets, for example), clear the context cache with cdk context --clear.
Shared vs isolated resources
+--------------------+-----------+--------------------+ | Resource | Strategy | Reasoning | +--------------------+-----------+--------------------+ | VPC | Shared | Avoids NAT gateway | | | | duplication | | | | (~$32/mo per VPC | | | | per AZ) | +--------------------+-----------+--------------------+ | ECS Cluster | Per-stage | Isolates | | | | scheduling, | | | | capacity, | | | | CloudWatch metrics | +--------------------+-----------+--------------------+ | Internal ALB | Shared | One ALB with | | | | path-based routing | | | | for all stages | +--------------------+-----------+--------------------+ | Aurora Cluster | Shared | ACU scaling at | | | | cluster level; | | | | per-stage | | | | databases | +--------------------+-----------+--------------------+ | Aurora Database | Per-stage | Isolated data with | | | | shared compute | +--------------------+-----------+--------------------+ | ElastiCache Redis | Shared | One cluster, | | | | per-stage key | | | | prefixes (qa:, | | | | dev:) | +--------------------+-----------+--------------------+ | ECR Repositories | Shared | Per-stage image | | | | tags (backend:qa, | | | | backend:prod) | +--------------------+-----------+--------------------+ | Secrets Manager | Per-stage | Each stage has its | | | | own secrets | +--------------------+-----------+--------------------+ | Security Groups | Per-stage | Network isolation | | | | between stages | +--------------------+-----------+--------------------+ | CloudWatch | Per-stage | Each team monitors | | Dashboards | | their own | | | | environment | +--------------------+-----------+--------------------+
Per-stage ECS clusters
Each stage gets its own ECS cluster, providing:
- Metric isolation — QA’s CPU spike doesn’t pollute Prod’s dashboard
- Capacity isolation — QA autoscaling can’t consume capacity Prod needs
- IAM scoping — developers can be restricted to only deploy to their cluster
cluster = ecs.Cluster(
self, f"Cluster{stage_name.title()}",
vpc=vpc,
container_insights_v2=ecs.ContainerInsights.ENABLED,
)Shared Aurora with per-stage databases
Instead of six Aurora clusters (expensive), one cluster with per-stage database names:
class SharedDbStack(Stack):
def __init__(self, scope, construct_id, **kwargs):
super().__init__(scope, construct_id, **kwargs)
cluster = rds.DatabaseCluster(
self, "AuroraCluster",
engine=rds.DatabaseClusterEngine.aurora_postgres(
version=rds.AuroraPostgresEngineVersion.VER_15_4,
),
serverless_v2_min_capacity=0.5,
serverless_v2_max_capacity=8,
writer=rds.ClusterInstance.serverless_v2("writer"),
vpc=vpc,
)Each stage’s backend connects to the same cluster endpoint with a different database name:
# BackendStackQA
environment={"DATABASE_NAME": "app_qa", "DATABASE_HOST": db_endpoint}
# BackendStackProd
environment={"DATABASE_NAME": "app_prod", "DATABASE_HOST": db_endpoint}Trade-off: a cluster-level issue (storage full, writer failover) affects all stages. For Prod isolation, a dedicated cluster is worth the cost. For non-production stages, sharing is cost-effective.
Per-stage configuration via class constants
Each stack class defines its own constants rather than relying on environment variables for CDK-level configuration:
class BackendStackQA(Stack):
STAGE_NAME = "qa"
DESIRED_COUNT = 1
CPU = 256
MEMORY = 512
MIN_CAPACITY = 1
MAX_CAPACITY = 2
DATABASE_NAME = "app_qa"
ECR_TAG = "qa"
ENABLE_DEBUG_LOGGING = True
ENABLE_AUTOSCALING = False
ENABLE_WAF = False
class BackendStackProd(Stack):
STAGE_NAME = "prod"
DESIRED_COUNT = 2
CPU = 1024
MEMORY = 2048
MIN_CAPACITY = 2
MAX_CAPACITY = 10
DATABASE_NAME = "app_prod"
ECR_TAG = "prod"
ENABLE_DEBUG_LOGGING = False
ENABLE_AUTOSCALING = True
ENABLE_WAF = TrueA shared setup method uses these constants:
def create_fargate_service(self):
service = ecs.FargateService(
self, f"BackendService{self.STAGE_NAME.title()}",
cluster=self.cluster,
task_definition=self.task_def,
desired_count=self.DESIRED_COUNT,
min_healthy_percent=100,
max_healthy_percent=200,
circuit_breaker=ecs.DeploymentCircuitBreaker(rollback=True),
)
if self.ENABLE_AUTOSCALING:
scaling = service.auto_scale_task_count(
min_capacity=self.MIN_CAPACITY,
max_capacity=self.MAX_CAPACITY,
)
scaling.scale_on_cpu_utilization(
"CpuScaling",
target_utilization_percent=70,
scale_in_cooldown=Duration.seconds(60),
scale_out_cooldown=Duration.seconds(60),
)
return serviceCI/CD deployment workflow
Each stage has its own GitHub Actions workflow. A push to the qa branch deploys QA; a push to main deploys Prod:
# .github/workflows/deploy-qa.yml
name: Deploy QA
on:
push:
branches: [qa]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: {python-version: '3.11'}
- run: pip install -r requirements.txt
- run: npm install -g aws-cdk
- run: STAGE=qa cdk deploy --all --require-approval never
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-1Deployment order: Shared must deploy first (creates ALB, VPC link). QA must deploy second (creates VPC). After that, all other stages can deploy in parallel.
Useful commands:
# Deploy just one stack
STAGE=qa cdk deploy BackendStackQA
# Deploy all QA stacks
STAGE=qa cdk deploy --all
# Preview changes to Prod without deploying
STAGE=prod cdk diff
# Synthesize all stacks for validation
STAGE=all cdk synthBest practices
Always set env on stacks. Stacks without an explicit Environment can’t use context providers like Vpc.from_lookup or HostedZone.from_lookup.
Commit cdk.context.json. The context cache stores VPC lookups and availability zone queries. Committing it ensures CI/CD builds produce the same templates without AWS credentials and that VPC lookup results are stable across developers.
Use stack name prefixes. Every stack name should include the project and stage: AppBackendStackQA, AppBackendStackProd. CloudFormation names are unique per account per region.
Add a Prod destruction safety net:
if stage == "prod" and os.getenv("CONFIRM_PROD") != "yes-destroy-prod":
print("Set CONFIRM_PROD=yes-destroy-prod to destroy production stacks")
sys.exit(1)Tag everything by stage for cost allocation:
Tags.of(self).add("Stage", self.STAGE_NAME)
Tags.of(self).add("ManagedBy", "cdk")Run cdk diff before every deployment to the review diff before committing. Look for unexpected resource replacements ([-] then [+]), especially for stateful resources.
Deploy Prod last, always. If a CDK code change breaks synthesis or creates invalid resources, you discover it in QA first.