SOC 2 Type I Readiness on AWS — Jagadeeswara Reddy P

SOC 2 is not a compliance team problem — the vast majority of controls map directly to infrastructure configuration, application security patterns, deployment practices, and monitoring. If your infrastructure is defined in code, SOC 2 readiness is literally a code review problem.

This post documents the technical work behind a SOC 2 Type I audit: how the five trust service criteria map to AWS controls, how we triaged 300+ findings, and the CDK code changes that resolved the critical ones without service disruption.

Type I vs Type II

+----------+--------------------+--------------------+
| Aspect   | SOC 2 Type I       | SOC 2 Type II      |
+----------+--------------------+--------------------+
| Scope    | Control design at  | Control            |
|          | a point in time    | effectiveness over |
|          |                    | 6-12 months        |
+----------+--------------------+--------------------+
| Evidence | "Controls are in   | "Controls work     |
|          | place"             | consistently"      |
+----------+--------------------+--------------------+
| Timeline | 4-8 weeks          | 3-12 months        |
|          | preparation        | observation        |
+----------+--------------------+--------------------+
| Cost     | Lower              | Significantly      |
|          |                    | higher             |
+----------+--------------------+--------------------+
| Value    | Entry ticket for   | Full assurance for |
|          | enterprise sales   | sustained          |
|          |                    | relationships      |
+----------+--------------------+--------------------+

For a startup, Type I is the entry ticket. It demonstrates that you’ve thought about security, designed controls, and implemented them. Type II comes later and proves those controls operate correctly over time.

The five trust service criteria

SOC 2 is organized around five Trust Service Criteria (TSC). Not all five are always in scope — you choose which to include. Here’s how each maps to AWS infrastructure concerns:

Security (CC6.x, CC7.x) — always required

The foundational criterion. Covers logical access, system boundaries, encryption, change management, and risk mitigation. Every SOC 2 audit includes Security.

AWS controls that matter: IAM policies and key rotation, encryption at rest and in transit, Security Hub, CloudTrail, CIS CloudWatch alarms, VPC security group rules, Secrets Manager rotation.

Availability (A1.x)

System uptime and performance commitments. Maps to SLAs, health checks, auto-scaling, backup policies, multi-AZ deployments, and disaster recovery plans.

AWS controls: ECS service circuit breakers, RDS Multi-AZ, Aurora automated backups, ALB health checks, CloudWatch alarms on unhealthy targets.

Processing Integrity (PI1.x)

Data processing accuracy and completeness. Relevant if your system processes transactions or calculations that customers depend on for accuracy.

AWS controls: SQS dead-letter queues, Lambda error handling, CloudTrail audit trails, idempotent API design.

Confidentiality (C1.x)

Protection of confidential information — trade secrets, business plans, IP. Focuses on data classification and lifecycle management rather than perimeter defense.

AWS controls: S3 bucket policies with SSL enforcement, KMS encryption with least privilege key policies, Secrets Manager over plaintext config, IAM resource-level permissions.

Privacy (P1.x–P8.x)

Management of personal information in accordance with your privacy notice. Only required if you process PII on behalf of customers.

AWS controls: Macie for S3 sensitive data discovery, S3 lifecycle policies for data retention, CloudTrail for access auditing, VPC endpoints to prevent PII from traversing the public internet.

Infrastructure audit scope

The audit covered two CDK codebases, one AWS account, two regions, approximately 70 CloudFormation stacks:

ECS Fargate services (backend, geometry, workers, converters) across 6 environments
App Runner services (10+)
Aurora PostgreSQL cluster (shared across environments)
ElastiCache Redis clusters
Multiple S3 buckets (frontend, uploads, logs, datalake)
20+ Secrets Manager secrets
10+ IAM users for service accounts and integrations
CloudFront, API Gateway, VPC Links

The 300+ finding triage

Severity breakdown

+---------------+-------+--------------------+
| Severity      | Count | Categories         |
+---------------+-------+--------------------+
| Critical      | 12    | Publicly           |
|               |       | accessible         |
|               |       | databases, IAM     |
|               |       | keys over 180      |
|               |       | days, unencrypted  |
|               |       | secrets            |
+---------------+-------+--------------------+
| High          | 47    | Missing CloudTrail |
|               |       | alarms, no secrets |
|               |       | rotation, policies |
|               |       | on users not       |
|               |       | groups             |
+---------------+-------+--------------------+
| Medium        | 156   | Missing log        |
|               |       | encryption, no VPC |
|               |       | flow logs,         |
|               |       | incomplete backup  |
|               |       | configs            |
+---------------+-------+--------------------+
| Low           | 89    | Best-practice      |
|               |       | suggestions,       |
|               |       | non-default        |
|               |       | settings           |
+---------------+-------+--------------------+
| Informational | 18    | Metadata findings, |
|               |       | service            |
|               |       | availability notes |
+---------------+-------+--------------------+

The finding-to-control mapping

Not every Security Hub finding is a SOC 2 concern. The auditor cares about controls, not individual findings. A pattern of unresolved findings indicates a control gap:

+--------------------+--------------------+--------------------+
| Finding            | SOC 2 Control      | Example            |
+--------------------+--------------------+--------------------+
| CIS 1.14 - Key     | CC6.1 Security     | IAM access keys    |
| rotation           |                    | older than 90 days |
+--------------------+--------------------+--------------------+
| CIS 2.1 -          | CC7.2 Security     | Trail not enabled  |
| CloudTrail enabled |                    | in all regions     |
+--------------------+--------------------+--------------------+
| CIS 3.x -          | CC7.2, CC7.3       | Missing            |
| CloudWatch alarms  | Security           | unauthorized API   |
|                    |                    | call alarm         |
+--------------------+--------------------+--------------------+
| FSBP S3.5 -        | CC6.7 Security     | S3 bucket without  |
| Enforce SSL        |                    | SSL policy         |
+--------------------+--------------------+--------------------+
| FSBP RDS.3 -       | C1.1               | Unencrypted        |
| Encryption at rest | Confidentiality    | database storage   |
+--------------------+--------------------+--------------------+
| Prowler - Secrets  | CC6.1 Security     | Secrets without    |
| rotation           |                    | rotation schedule  |
+--------------------+--------------------+--------------------+
| Prowler - Policies | CC6.3 Security     | Policies attached  |
| on groups          |                    | directly to users  |
+--------------------+--------------------+--------------------+

Decision framework: fix vs accept vs mitigate

For each finding, we applied three questions:

Is this a SOC 2 control? If not, accept risk and document.
Can it cause actual harm? If no real harm + auditor won’t flag it → accept. If real harm → assess downtime risk.
Downtime risk? No downtime → fix now. Downtime + SOC 2 blocking → plan maintenance window. Downtime + not blocking → compensating control + future remediation plan.

CIS Benchmark section 1: IAM fixes

The CIS AWS Foundations Benchmark is the most referenced standard in SOC 2 audits of AWS environments. Section 1 (IAM) had the most critical findings.

Group-based policies instead of user-attached policies

Prowler’s iam_policy_attached_only_to_group_or_roles check fired on every Bedrock and service account user — we had policies attached directly to users instead of groups. The fix was a dedicated CDK stack:

class BackendUsersGroupStack(Stack):
    """Shared IAM group for all backend/Bedrock users.
    
    Moves permissions from direct user attachments to group-based policies,
    satisfying Prowler iam_policy_attached_only_to_group_or_roles.
    """
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        group = iam.Group(
            self,
            "BackendUsersGroup",
            group_name="backend-users",
        )

        group.add_managed_policy(
            iam.ManagedPolicy(
                self,
                "BackendUsersPolicy",
                managed_policy_name="BackendUsersSharedPolicy",
                statements=[
                    iam.PolicyStatement(
                        effect=iam.Effect.ALLOW,
                        actions=[
                            "bedrock:InvokeModel",
                            "bedrock:InvokeModelWithResponseStream",
                        ],
                        resources=["*"],
                    ),
                    iam.PolicyStatement(
                        effect=iam.Effect.ALLOW,
                        actions=["secretsmanager:GetSecretValue"],
                        resources=["arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:bedrock-*"],
                    ),
                ],
            )
        )

In app.py, ensure this stack deploys before any user stack that references it:

backend_users_group_stack = BackendUsersGroupStack(
    app, "BackendUsersGroupStack", env=env
)

bedrock_user_stack = BedrockUserStack(
    app, "BedrockUserStack",
    group_name="backend-users",
    env=env
)
bedrock_user_stack.add_dependency(backend_users_group_stack)

Confused deputy prevention via Aspects

IAM roles with service principals lacking aws:SourceAccount conditions are vulnerable to confused deputy attacks. A CDK Aspect injects the condition across all stacks:

import jsii
from aws_cdk import IAspect, aws_iam as iam

@jsii.implements(IAspect)
class ConfusedDeputyPrevention:
    SERVICE_PRINCIPALS = {
        "ecs-tasks.amazonaws.com", "lambda.amazonaws.com",
        "states.amazonaws.com", "apprunner.amazonaws.com",
        "monitoring.rds.amazonaws.com",
    }
    SKIP_ROLE_PATTERNS = {"BitbucketDeploymentRole", "AzureDevOpsDeploymentRole"}

    def __init__(self, account_id: str):
        self.account_id = account_id

    def visit(self, node):
        if not isinstance(node, iam.CfnRole):
            return
        for pattern in self.SKIP_ROLE_PATTERNS:
            if pattern in node.node.path:
                return
        node.add_property_override(
            "AssumeRolePolicyDocument.Statement.0.Condition"
            ".StringEquals.aws:SourceAccount",
            self.account_id,
        )

cdk.Aspects.of(app).add(ConfusedDeputyPrevention(app_account_id))

Zero-downtime remediation strategies

The constraint that shaped our entire approach: production services could not go down. Several SOC 2 controls require changes that would normally cause service restarts. Here’s how we handled them.

ECS platform version upgrade (ECS.20)

ECS.20 requires Fargate tasks to run on platform version 1.4.0 or later. Changing the platform version causes a rolling deployment (no downtime) but it still restarts tasks:

# Safe: ECS rolling deployment, no downtime
fargate_service = ecs.FargateService(
    self, "BackendService",
    platform_version=ecs.FargatePlatformVersion.VERSION1_4,  # Added
    ...
)

We deployed this during business hours after verifying health check configuration was correct. The rolling deployment replaces one task at a time.

ECS network mode migration (ECS.21)

ECS.21 requires tasks to run in awsvpc network mode (not bridge). awsvpc is the only mode supported on Fargate, so this was already correct for our Fargate services. The finding appeared on old task definition revisions that CloudFormation had never cleaned up.

Resolution: register a new task definition revision with networkMode=awsvpc and deploy. Old revisions remain but the finding only applies to active deployments.

ALB TLS 1.3 upgrade (ELB.22)

Upgrading the TLS policy affects all connections through the listener at the moment of change — but the change itself is atomic and sub-second:

aws elbv2 modify-listener 
  --listener-arn "$LISTENER_ARN" 
  --ssl-policy ELBSecurityPolicy-TLS13-1-3-2021-06

We made this change during low-traffic hours. No service disruption.

Secrets Manager rotation for IAM access keys

IAM access keys older than 90 days trigger the IAM.2 finding. We automated rotation using a Lambda rotation function and Secrets Manager:

# Service account with 90-day rotation
secret = sm.Secret(self, "ServiceAccountSecret",
    secret_name="myservice-access-key",
    description="Auto-rotating IAM access key for myservice",
)

secret.add_rotation_schedule("RotationSchedule",
    rotation_lambda=rotation_lambda,
    automatically_after=Duration.days(90),
)

See the IAM Key Rotation post for the full Lambda implementation.

What we fixed, what we skipped

+--------------------+--------------------+--------------------+
| Finding            | Decision           | Rationale          |
+--------------------+--------------------+--------------------+
| Termination        | Fixed via CDK loop | Zero risk, 21      |
| protection (CF.4)  |                    | findings resolved  |
|                    |                    | in one change      |
+--------------------+--------------------+--------------------+
| S3 SSL enforcement | Fixed in CDK       | Added              |
|                    |                    | enforce_ssl=True   |
|                    |                    | to all S3 Bucket   |
|                    |                    | constructs         |
+--------------------+--------------------+--------------------+
| Missing CloudTrail | Fixed via CDK      | See CIS CloudWatch |
| alarms (14)        | stack              | alarms post        |
+--------------------+--------------------+--------------------+
| IAM policies on    | Fixed via group    | BackendUsersGroupS |
| users (IAM.16)     | stack              | tack               |
+--------------------+--------------------+--------------------+
| EBS default        | Fixed via CLI      | aws ec2            |
| encryption         |                    | enable-ebs-encrypt |
|                    |                    | ion-by-default     |
+--------------------+--------------------+--------------------+
| Redis in-transit   | Accepted risk      | Private subnet     |
| encryption         |                    | only, VPC CIDR     |
|                    |                    | locked down, Q3    |
|                    |                    | remediation        |
|                    |                    | planned            |
+--------------------+--------------------+--------------------+
| ECS read-only root | Accepted risk      | Superset writes to |
| filesystem (ECS.1) |                    | /app/static/ at    |
|                    |                    | container start    |
+--------------------+--------------------+--------------------+
| Root account MFA   | Deferred           | Requires           |
| delete             |                    | out-of-band        |
|                    |                    | coordination with  |
|                    |                    | AWS Support        |
+--------------------+--------------------+--------------------+
| Multi-region       | Deferred           | Budget constraint, |
| Config recorder    |                    | acceptable for     |
|                    |                    | Type I scope       |
+--------------------+--------------------+--------------------+

CIS Section 3: CloudWatch monitoring gap

All 14 CIS CloudWatch alarms were missing. Each is a separate Security Hub finding, and collectively they represent a critical gap in CC7.1 (monitoring) and CC7.2 (incident detection).

We created a dedicated CDK stack for all 14 alarms. See the CIS CloudWatch Alarms post for the complete implementation.

Results

After four weeks of remediation:

+--------------------+------------+--------------------+
| Metric             | Before     | After              |
+--------------------+------------+--------------------+
| Total findings     | 300+       | ~40 (all accepted  |
|                    |            | risks)             |
+--------------------+------------+--------------------+
| SOC 2 controls     | 3/27 (11%) | 22/27 (81%)        |
| passing            |            |                    |
+--------------------+------------+--------------------+
| Critical findings  | 12         | 0                  |
+--------------------+------------+--------------------+
| High findings      | 47         | 6 (all accepted    |
|                    |            | risk)              |
+--------------------+------------+--------------------+
| CIS controls       | 18/49      | 43/49              |
| passing            |            |                    |
+--------------------+------------+--------------------+

The 5 failing SOC 2 controls required out-of-scope changes: root account MFA delete, multi-region Config recorder, and IAM Identity Center migration. These were addressed in a separate workstream after the Type I audit.

Lessons learned

SOC 2 is a developer audit. Every finding in our account was an infrastructure code issue, not a process issue. If your infrastructure is in CDK, your SOC 2 remediation is CDK PRs.

CDK Aspects are the right tool for cross-cutting security controls. Termination protection on 70 stacks, confused deputy prevention on all roles, no-public-ingress enforcement — one Aspect handles what would otherwise be 70+ individual code changes.

The hardest part is the risk acceptance documentation. Every finding you don’t fix needs a written rationale. “We’ll fix it later” is not acceptable. “Redis is in private subnets with VPC CIDR-locked security groups; in-transit encryption is accepted risk until client libraries support rediss:// in Q3” is acceptable.

Start with the CIS CloudWatch alarms. They’re all medium severity but collectively represent a critical monitoring gap. They’re also fast to create — 14 alarms in one CDK stack in an afternoon.

Don’t audit your way to security. SOC 2 is a lagging indicator. The security posture you build for the audit should be the one you’d want anyway. If you’re only doing it for the report, you’ll pass Type I and then let it drift. Build it into the CDK code so it can’t drift.