CDK Foundations — Jagadeeswara Reddy P

Why multiple stacks

A single CloudFormation stack has a hard limit of 500 resources. Hit that ceiling and your cdk deploy fails with a cryptic error about maximum template size. But the resource limit is only the forcing function — the real reason to split stacks is blast radius.

Stateful resources (RDS instances, S3 buckets, DynamoDB tables) should live in their own stacks. A bad Lambda deploy should never risk your database. Stateless resources (Lambda functions, API Gateway, ALB listeners) can be torn down and recreated without data loss. Separating them means a failed deployment in your API stack doesn’t touch the stack that owns your production database.

Each stack is an independent deployment unit. You can deploy the API stack without redeploying the VPC stack. You can roll back one stack without affecting others. You can grant different IAM permissions for who can deploy which stacks.

The stack graph

The app.py entrypoint defines the stack graph — which stacks exist and how they relate. CDK synthesizes all stacks in one pass, but CloudFormation deploys them respecting dependency order.

app = cdk.App()

# Shared infrastructure
vpc_link_stack = SharedVpcLinkStack(app, "SharedVpcLink", env=CDK_ENV)
alb_stack = SharedAlbStack(app, "SharedAlb", env=CDK_ENV)

# Services depend on shared infra
fastapi_stack = PrivateFastApiStack(
    app, "PrivateFastApi",
    target_env="app",
    listener_rule_priority=2,
    env=CDK_ENV,
)

# Explicit ordering
bedrock_stack.add_dependency(users_group_stack)

add_dependency creates an explicit edge in the deployment graph. CDK also infers implicit dependencies when one stack references a resource from another. But implicit dependencies only work with CloudFormation exports — and those come with problems.

Stack naming

CDK generates physical stack names from the construct ID you pass. SharedVpcLink becomes a CloudFormation stack named SharedVpcLink. Keep IDs stable — changing them creates a new stack and orphans the old one. CDK won’t delete the old stack for you.

Cross-stack references: SSM over exports

The natural CDK pattern for cross-stack references is to pass a construct from one stack to another:

class VpcStack(cdk.Stack):
    def __init__(self, scope, id, **kwargs):
        super().__init__(scope, id, **kwargs)
        self.vpc = ec2.Vpc(self, "Vpc")

class ApiStack(cdk.Stack):
    def __init__(self, scope, id, vpc, **kwargs):
        super().__init__(scope, id, **kwargs)
        # Uses vpc directly — CDK creates a CloudFormation Export

CDK implements this with CloudFormation exports under the hood. The VPC stack exports the VPC ID, and the API stack imports it. This works until you need to change the exported value. CloudFormation exports are immutable while imported — you cannot update or delete the producing stack while any other stack references its exports. This creates deployment deadlocks.

SSM parameters break the hard coupling. The producing stack writes a parameter, the consuming stack reads it. No CloudFormation-level dependency exists between them.

Stack A writes to SSM

ssm.StringParameter(
    self, "VpcLinkIdParam",
    parameter_name="/acmecorp/shared/vpc-link-id",
    string_value=vpc_link.vpc_link_id,
)

Stack B reads from SSM

vpc_link_id = ssm.StringParameter.value_for_string_parameter(
    self, "/acmecorp/shared/vpc-link-id"
)

+--------------------+-----------+-----------------+
| Feature            | CfnExport | SSM Parameter   |
+--------------------+-----------+-----------------+
| Mutable            | No        | Yes             |
+--------------------+-----------+-----------------+
| Delete producing   | Blocked   | Free            |
| stack              |           |                 |
+--------------------+-----------+-----------------+
| Cross-region       | No        | Manual          |
+--------------------+-----------+-----------------+
| Cost               | Free      | Free (standard) |
+--------------------+-----------+-----------------+

L1, L2, L3 constructs

CDK constructs come in three levels. L2 is the default — s3.Bucket, lambda_.Function, ec2.Vpc. These provide sensible defaults, grant_* methods for IAM, metric_* methods for CloudWatch, and high-level APIs that hide CloudFormation complexity.

L1 constructs (CfnBucket, CfnFunction) are direct CloudFormation mappings. Every property maps 1:1 to the template. Use L1 when L2 doesn’t expose a property you need.

L3 constructs (ApplicationLoadBalancedFargateService) are opinionated patterns that create multiple resources. They’re convenient but hide complexity — understanding what they generate matters when debugging.

The escape hatch

When an L2 construct doesn’t expose a CloudFormation property, drop to L1 through the escape hatch:

bucket = s3.Bucket(self, "Data")

# Access the underlying CfnBucket (L1)
cfn_bucket = bucket.node.default_child
cfn_bucket.add_property_override(
    "VersioningConfiguration.Status", "Enabled"
)

This modifies the synthesized CloudFormation template directly. The L2 construct remains in your code for its grant_* and metric_* methods, but the specific property is set at L1.

Environment configuration

Acmecorp runs multiple environments: QA, QA2, demo, production. Each environment needs a subset of stacks. Hardcoding stack instantiation per environment doesn’t scale. Data-driven configuration does.

ALL_ENVIRONMENTS = [
    {"env_name": "qa", "fastapi": True, "loopback": True, "zeropoint": True},
    {"env_name": "qa2", "fastapi": True, "loopback": False, "zeropoint": False},
    {"env_name": "demo", "fastapi": True, "loopback": True, "zeropoint": True},
]

for env_config in ALL_ENVIRONMENTS:
    if env_config["fastapi"]:
        PrivateFastApiStack(
            app,
            f"FastApi{env_config['env_name'].title()}",
            target_env=env_config["env_name"],
            env=CDK_ENV,
        )
    if env_config["loopback"]:
        LoopbackStack(
            app,
            f"Loopback{env_config['env_name'].title()}",
            target_env=env_config["env_name"],
            env=CDK_ENV,
        )

Adding a new environment is a one-line dictionary addition. Removing a service from an environment is a boolean flip. The loop generates the stack graph — cdk synth shows exactly which stacks exist for which environments.

Context values

CDK also supports cdk.json context for configuration that varies between deployments:

account_id = self.node.try_get_context("account_id")

Use context for values that change per-deploy (account IDs, region). Use the environment dictionary for structural decisions (which stacks to create).

CDK Aspects: enforce rules at synth

Aspects use the visitor pattern to traverse the entire construct tree after synthesis. They inspect or mutate every node — every stack, every construct, every CloudFormation resource. This is where you enforce organization-wide rules before a template ever reaches CloudFormation.

Confused deputy prevention

Any IAM role that a service assumes should include a condition restricting which account can assume it. Without this, a compromised service in another account could assume your roles. An Aspect enforces this across every role in every stack:

class ConfusedDeputyAspect(cdk.IAspect):
    def visit(self, node):
        if isinstance(node, iam.CfnRole):
            trust = node.assume_role_policy_document
            for stmt in trust.get("Statement", []):
                stmt.setdefault("Condition", {})
                stmt["Condition"]["StringEquals"] = {
                    "aws:SourceAccount": cdk.Stack.of(node).account
                }

cdk.Aspects.of(app).add(ConfusedDeputyAspect())

Apply the Aspect at the app level and it visits every construct in every stack. Apply it to a single stack and it only visits that stack’s constructs.

Other useful Aspects

Aspects aren’t limited to security. Common uses include:

Tagging: apply cost-allocation tags to every resource
Encryption enforcement: ensure every S3 bucket and RDS instance has encryption enabled
Removal policies: set RemovalPolicy.RETAIN on all stateful resources
Log retention: enforce a maximum CloudWatch log retention period

Common mistakes

Circular dependencies. Stack A exports a value that Stack B imports, and Stack B exports a value that Stack A imports. CloudFormation cannot deploy either stack first. Fix: use SSM parameters instead of exports — they break the CloudFormation-level dependency cycle.

Hardcoded resource names. Setting bucket_name="my-bucket" means CloudFormation can’t replace the resource (names must be unique). Let CDK generate names with suffixes. Only hardcode names when external systems need a stable reference — and use SSM to share the name instead of hardcoding it in the consumer.

Missing RemovalPolicy on stateful resources. The default RemovalPolicy is DESTROY — delete the CloudFormation stack and your S3 bucket, RDS instance, or DynamoDB table is gone. Set removal_policy=cdk.RemovalPolicy.RETAIN on anything that holds data.

Modifying resources during synth. CDK synth should be pure — it generates a template. Making API calls during synth (checking if a resource exists, reading runtime config) makes builds non-deterministic and slow. Use context values or SSM parameters instead.

Forgetting add_dependency with SSM references. SSM-based cross-stack references don’t create implicit CloudFormation dependencies. If you deploy Stack B before Stack A writes the SSM parameter, the deploy fails. Always add an explicit dependency.

These posts cover specific stacks built on the patterns above:

Private API Gateway with VPC Link — the shared VPC Link stack and how services register with it
RDS Proxy Connection Pooling — stateful database infrastructure separated from the Lambda stacks that consume it
IAM Key Rotation — a standalone stack for credential rotation using Secrets Manager and Lambda

CDK is a code generator for CloudFormation. The patterns that matter — stack boundaries, SSM-based coupling, Aspects for policy enforcement — are about managing the graph of generated templates, not about the CDK code itself. Get the graph right and deployments stay independent, rollbacks stay safe, and adding a new service is a dictionary entry, not a architecture review.