<< BACK

CDK Patterns for Multi-Environment Deployments

managing six environments from a single CDK application — one VPC, the STAGE env var pattern, shared vs isolated resource decisions, and why we chose simplicity over CDK Pipelines.

DATE:
JAN.08.2025
READ:
16 MIN

Most CDK tutorials show you how to deploy one stack. In production, you need to deploy the same application across QA, Dev, Demo, POC, Mono, and Prod — each with different configurations, some sharing infrastructure, others fully isolated. This post documents the patterns we use to manage six environments from a single CDK application, all within one AWS account and one VPC.

The key design decisions: which resources are shared across stages, which are isolated, how the STAGE environment variable controls synthesis, and how VPC lookup replaces VPC duplication.


Architecture overview

All environments run in one AWS account and one region. A single VPC hosts all stages. Each stage gets its own ECS cluster, its own ECS services, and its own listener rules on the shared ALB. Aurora Serverless is shared across stages with separate database names.

Why one VPC? Running six VPCs in a single account wastes IP address space and money. NAT gateways cost $0.045/hour each (~$32/month per AZ per VPC). With one VPC and properly configured security groups, all stages communicate with shared resources — Aurora, Redis, ALB — without peering.

The trade-off is blast radius. A misconfigured security group rule could expose QA services to Prod traffic. We mitigate with strict per-stage security groups and separate ECS clusters, but for organizations with strict compliance requirements, separate accounts may be necessary.


The app.py structure

The CDK entry point uses the STAGE environment variable to control which stacks are synthesized:

++
#!/usr/bin/env python3
import os
from aws_cdk import App, Environment

app = App()

env = Environment(account="ACCOUNT_ID", region="us-east-1")
stage = os.getenv("STAGE", "").lower()

# Shared infrastructure — deployed once
if stage in ["shared", "all"]:
    from stacks.shared_alb_stack import SharedAlbStack
    from stacks.shared_db_stack import SharedDbStack
    SharedAlbStack(app, "SharedAlbStack", env=env)
    SharedDbStack(app, "SharedDbStack", env=env)

# QA — creates the VPC that all other stacks use
if stage in ["qa", "all"]:
    from stacks.backend_stack_qa import BackendStackQA
    from stacks.frontend_stack_qa import FrontendStackQA
    BackendStackQA(app, "BackendStackQA", env=env)
    FrontendStackQA(app, "FrontendStackQA", env=env)

# Dev
if stage in ["dev", "all"]:
    from stacks.backend_stack_dev import BackendStackDev
    BackendStackDev(app, "BackendStackDev", env=env)

# Demo
if stage in ["demo", "all"]:
    from stacks.backend_stack_demo import BackendStackDemo
    BackendStackDemo(app, "BackendStackDemo", env=env)

# POC
if stage in ["poc", "all"]:
    from stacks.backend_stack_poc import BackendStackPoc
    BackendStackPoc(app, "BackendStackPoc", env=env)

# Mono
if stage in ["mono", "all"]:
    from stacks.backend_stack_mono import BackendStackMono
    BackendStackMono(app, "BackendStackMono", env=env)

# Prod
if stage in ["prod", "all"]:
    from stacks.backend_stack_prod import BackendStackProd
    from stacks.frontend_stack_prod import FrontendStackProd
    BackendStackProd(app, "BackendStackProd", env=env)
    FrontendStackProd(app, "FrontendStackProd", env=env)

app.synth()
++

Why not CDK Stage constructs or CDK Pipelines?

CDK has a Stage construct designed for multi-environment deployments. We chose the env var pattern for several reasons:

  1. Synthesis speed. Synthesizing only the QA stacks takes 3 seconds. Synthesizing all stacks takes 20+ seconds because of VPC lookups and other context calls. In a development loop, this matters.

  2. Simplicity. The env var pattern is instantly understandable by anyone who reads the code. No CDK-specific abstractions to learn.

  3. CI/CD compatibility. Our GitHub Actions workflows set STAGE as an environment variable. Each stage has its own workflow file, running independently.

  4. Incremental adoption. We started with a single QA stack and added stages one at a time, with zero refactoring.

CDK Pipelines is the right choice for organizations with separate AWS accounts per environment and strict approval workflows. For single-account setups, the env var pattern is simpler and faster.


VPC sharing: create once, look up everywhere

The QA stack creates the VPC. Every other stack looks it up by CloudFormation stack name tag:

QA stack: VPC creation

++
class BackendStackQA(Stack):
    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        # QA creates the VPC — this is the source of truth
        self.vpc = ec2.Vpc(
            self, "VpcQA",
            max_azs=2,
            nat_gateways=1,
            subnet_configuration=[
                ec2.SubnetConfiguration(
                    name="Public",
                    subnet_type=ec2.SubnetType.PUBLIC,
                    cidr_mask=24,
                ),
                ec2.SubnetConfiguration(
                    name="Private",
                    subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,
                    cidr_mask=24,
                ),
                ec2.SubnetConfiguration(
                    name="Isolated",
                    subnet_type=ec2.SubnetType.PRIVATE_ISOLATED,
                    cidr_mask=24,
                ),
            ],
        )
++

Other stacks: VPC lookup

++
class BackendStackDev(Stack):
    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        vpc = ec2.Vpc.from_lookup(
            self, "VpcDev",
            tags={"aws:cloudformation:stack-name": "BackendStackQA"},
        )
        cluster = ecs.Cluster(self, "ClusterDev", vpc=vpc)
++

Vpc.from_lookup is a synthesis-time operation. CDK makes an API call to describe VPCs matching the filter criteria and caches the result in cdk.context.json. Commit this file to version control — CI/CD builds produce the same templates even if the AWS environment changes, and synthesis works without AWS credentials.

If the VPC changes (new subnets, for example), clear the context cache with cdk context --clear.


Shared vs isolated resources

+--------------------+-----------+--------------------+
| Resource           | Strategy  | Reasoning          |
+--------------------+-----------+--------------------+
| VPC                | Shared    | Avoids NAT gateway |
|                    |           | duplication        |
|                    |           | (~$32/mo per VPC   |
|                    |           | per AZ)            |
+--------------------+-----------+--------------------+
| ECS Cluster        | Per-stage | Isolates           |
|                    |           | scheduling,        |
|                    |           | capacity,          |
|                    |           | CloudWatch metrics |
+--------------------+-----------+--------------------+
| Internal ALB       | Shared    | One ALB with       |
|                    |           | path-based routing |
|                    |           | for all stages     |
+--------------------+-----------+--------------------+
| Aurora Cluster     | Shared    | ACU scaling at     |
|                    |           | cluster level;     |
|                    |           | per-stage          |
|                    |           | databases          |
+--------------------+-----------+--------------------+
| Aurora Database    | Per-stage | Isolated data with |
|                    |           | shared compute     |
+--------------------+-----------+--------------------+
| ElastiCache Redis  | Shared    | One cluster,       |
|                    |           | per-stage key      |
|                    |           | prefixes (qa:,     |
|                    |           | dev:)              |
+--------------------+-----------+--------------------+
| ECR Repositories   | Shared    | Per-stage image    |
|                    |           | tags (backend:qa,  |
|                    |           | backend:prod)      |
+--------------------+-----------+--------------------+
| Secrets Manager    | Per-stage | Each stage has its |
|                    |           | own secrets        |
+--------------------+-----------+--------------------+
| Security Groups    | Per-stage | Network isolation  |
|                    |           | between stages     |
+--------------------+-----------+--------------------+
| CloudWatch         | Per-stage | Each team monitors |
| Dashboards         |           | their own          |
|                    |           | environment        |
+--------------------+-----------+--------------------+

Per-stage ECS clusters

Each stage gets its own ECS cluster, providing:

  • Metric isolation — QA’s CPU spike doesn’t pollute Prod’s dashboard
  • Capacity isolation — QA autoscaling can’t consume capacity Prod needs
  • IAM scoping — developers can be restricted to only deploy to their cluster
++
cluster = ecs.Cluster(
    self, f"Cluster{stage_name.title()}",
    vpc=vpc,
    container_insights_v2=ecs.ContainerInsights.ENABLED,
)
++

Shared Aurora with per-stage databases

Instead of six Aurora clusters (expensive), one cluster with per-stage database names:

++
class SharedDbStack(Stack):
    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        cluster = rds.DatabaseCluster(
            self, "AuroraCluster",
            engine=rds.DatabaseClusterEngine.aurora_postgres(
                version=rds.AuroraPostgresEngineVersion.VER_15_4,
            ),
            serverless_v2_min_capacity=0.5,
            serverless_v2_max_capacity=8,
            writer=rds.ClusterInstance.serverless_v2("writer"),
            vpc=vpc,
        )
++

Each stage’s backend connects to the same cluster endpoint with a different database name:

++
# BackendStackQA
environment={"DATABASE_NAME": "app_qa", "DATABASE_HOST": db_endpoint}

# BackendStackProd
environment={"DATABASE_NAME": "app_prod", "DATABASE_HOST": db_endpoint}
++

Trade-off: a cluster-level issue (storage full, writer failover) affects all stages. For Prod isolation, a dedicated cluster is worth the cost. For non-production stages, sharing is cost-effective.


Per-stage configuration via class constants

Each stack class defines its own constants rather than relying on environment variables for CDK-level configuration:

++
class BackendStackQA(Stack):
    STAGE_NAME = "qa"
    DESIRED_COUNT = 1
    CPU = 256
    MEMORY = 512
    MIN_CAPACITY = 1
    MAX_CAPACITY = 2
    DATABASE_NAME = "app_qa"
    ECR_TAG = "qa"
    ENABLE_DEBUG_LOGGING = True
    ENABLE_AUTOSCALING = False
    ENABLE_WAF = False


class BackendStackProd(Stack):
    STAGE_NAME = "prod"
    DESIRED_COUNT = 2
    CPU = 1024
    MEMORY = 2048
    MIN_CAPACITY = 2
    MAX_CAPACITY = 10
    DATABASE_NAME = "app_prod"
    ECR_TAG = "prod"
    ENABLE_DEBUG_LOGGING = False
    ENABLE_AUTOSCALING = True
    ENABLE_WAF = True
++

A shared setup method uses these constants:

++
def create_fargate_service(self):
    service = ecs.FargateService(
        self, f"BackendService{self.STAGE_NAME.title()}",
        cluster=self.cluster,
        task_definition=self.task_def,
        desired_count=self.DESIRED_COUNT,
        min_healthy_percent=100,
        max_healthy_percent=200,
        circuit_breaker=ecs.DeploymentCircuitBreaker(rollback=True),
    )

    if self.ENABLE_AUTOSCALING:
        scaling = service.auto_scale_task_count(
            min_capacity=self.MIN_CAPACITY,
            max_capacity=self.MAX_CAPACITY,
        )
        scaling.scale_on_cpu_utilization(
            "CpuScaling",
            target_utilization_percent=70,
            scale_in_cooldown=Duration.seconds(60),
            scale_out_cooldown=Duration.seconds(60),
        )
    return service
++

CI/CD deployment workflow

Each stage has its own GitHub Actions workflow. A push to the qa branch deploys QA; a push to main deploys Prod:

++
# .github/workflows/deploy-qa.yml
name: Deploy QA
on:
  push:
    branches: [qa]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: '3.11'}
      - run: pip install -r requirements.txt
      - run: npm install -g aws-cdk
      - run: STAGE=qa cdk deploy --all --require-approval never
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          AWS_DEFAULT_REGION: us-east-1
++

Deployment order: Shared must deploy first (creates ALB, VPC link). QA must deploy second (creates VPC). After that, all other stages can deploy in parallel.

Useful commands:

++
# Deploy just one stack
STAGE=qa cdk deploy BackendStackQA

# Deploy all QA stacks
STAGE=qa cdk deploy --all

# Preview changes to Prod without deploying
STAGE=prod cdk diff

# Synthesize all stacks for validation
STAGE=all cdk synth
++

Best practices

Always set env on stacks. Stacks without an explicit Environment can’t use context providers like Vpc.from_lookup or HostedZone.from_lookup.

Commit cdk.context.json. The context cache stores VPC lookups and availability zone queries. Committing it ensures CI/CD builds produce the same templates without AWS credentials and that VPC lookup results are stable across developers.

Use stack name prefixes. Every stack name should include the project and stage: AppBackendStackQA, AppBackendStackProd. CloudFormation names are unique per account per region.

Add a Prod destruction safety net:

++
if stage == "prod" and os.getenv("CONFIRM_PROD") != "yes-destroy-prod":
    print("Set CONFIRM_PROD=yes-destroy-prod to destroy production stacks")
    sys.exit(1)
++

Tag everything by stage for cost allocation:

++
Tags.of(self).add("Stage", self.STAGE_NAME)
Tags.of(self).add("ManagedBy", "cdk")
++

Run cdk diff before every deployment to the review diff before committing. Look for unexpected resource replacements ([-] then [+]), especially for stateful resources.

Deploy Prod last, always. If a CDK code change breaks synthesis or creates invalid resources, you discover it in QA first.