From 300+ Findings to Compliant — Jagadeeswara Reddy P

When our SOC 2 Type I audit flagged over 300 AWS Security Hub findings, we faced a challenge familiar to any fast-moving startup: the infrastructure had grown faster than the security posture. Two CDK repositories, one AWS account, two regions, roughly 70 CloudFormation stacks, and services running on ECS Fargate behind Application Load Balancers — all of it needed a systematic security pass.

This post covers how we categorized, prioritized, and remediated those findings at scale, with real CLI commands, CDK code, and the triage framework we used to ship fixes without downtime.

The findings landscape

Our initial Prowler scan returned findings that broke down roughly like this:

+--------------------+-------+----------+
| Finding            | Count | Severity |
+--------------------+-------+----------+
| APIGateway.8       | 38    | Medium   |
| (execution         |       |          |
| logging)           |       |          |
+--------------------+-------+----------+
| ECS.20 (platform   | 33    | Medium   |
| version)           |       |          |
+--------------------+-------+----------+
| ECS.21 (network    | 28    | High     |
| configuration)     |       |          |
+--------------------+-------+----------+
| ELB.22 (TLS policy | 23    | Medium   |
| outdated)          |       |          |
+--------------------+-------+----------+
| IAM.2 (access key  | 22    | High     |
| rotation)          |       |          |
+--------------------+-------+----------+
| CloudFormation.4   | 21    | Medium   |
| (termination       |       |          |
| protection)        |       |          |
+--------------------+-------+----------+
| CloudWatch alarms  | 14    | Medium   |
| (CIS benchmarks)   |       |          |
+--------------------+-------+----------+
| S3 bucket          | ~45   | Mixed    |
| misconfigurations  |       |          |
+--------------------+-------+----------+
| ECR tag            | 15    | Medium   |
| immutability       |       |          |
+--------------------+-------+----------+
| Secrets Manager    | ~25   | High     |
| rotation           |       |          |
+--------------------+-------+----------+
| EBS encryption     | 2     | High     |
| defaults           |       |          |
+--------------------+-------+----------+

The total exceeded 300 individual instances. Many were the same control failing across multiple resources — ECS.20 fires once per task definition revision, for example.

The triage decision tree

Not all findings are created equal. Some represent genuine security risk; others are compliance checkbox items that would cause more harm to fix than to leave. We built a triage framework to route each finding:

Step 1: Is it a real security risk, or compliance-only?

Real risk → assess downtime risk
Compliance-only → assess fix complexity

Step 2a (real risk): Can it cause downtime if fixed?

No downtime risk → fix now via CDK or CLI
Downtime risk + blocks SOC 2 → plan maintenance window this sprint
Downtime risk + doesn’t block SOC 2 → defer to next sprint

Step 2b (compliance-only): Is the fix trivial?

Under 5 minutes → fix immediately via CLI
More complex → accept risk and document

This framework let us move quickly. Every finding landed in one of five buckets:

Quick win CLI — fix in under 5 minutes, zero downtime risk
CDK code change — fix in infrastructure-as-code, deploy with next release
Maintenance window — requires coordination, potential downtime
Accept risk — document why we’re not fixing it
Out of scope — not applicable to our architecture

Quick wins: CLI remediations that take minutes

The highest ROI came from fixes that could be applied account-wide with a single CLI command. Collectively these resolved dozens of findings.

ECR tag immutability

Every ECR repository without tag immutability generates a finding. We had 15 repositories across two regions:

# Set IMMUTABLE_WITH_EXCLUSION on all ECR repos in a region
# Allows deployment tags (latest, dev-build-*) to be overwritten
# while protecting release tags
for repo in $(aws ecr describe-repositories 
  --query 'repositories[*].repositoryName' 
  --output text --region ap-south-1); do
  aws ecr put-image-tag-mutability 
    --repository-name "$repo" 
    --image-tag-mutability IMMUTABLE_WITH_EXCLUSION 
    --region ap-south-1
  echo "Set immutable: $repo"
done

EBS default encryption

A single command per region ensures all new EBS volumes are encrypted at rest:

aws ec2 enable-ebs-encryption-by-default --region ap-south-1
aws ec2 get-ebs-encryption-by-default --region ap-south-1
# {"EbsEncryptionByDefault": true}

Existing volumes are unaffected — only new volumes created after this change are encrypted by default.

ALB TLS policy upgrade

All 23 ELB.22 findings came from ALBs using ELBSecurityPolicy-2016-08, which allows TLS 1.0 and 1.1:

LISTENERS=$(aws elbv2 describe-listeners 
  --query 'Listeners[?Protocol==`HTTPS`].ListenerArn' 
  --output text --region ap-south-1)

for arn in $LISTENERS; do
  aws elbv2 modify-listener 
    --listener-arn "$arn" 
    --ssl-policy ELBSecurityPolicy-TLS13-1-3-2021-06 
    --region ap-south-1
  echo "Upgraded: $arn"
done

S3 lifecycle policies

aws s3api put-bucket-lifecycle-configuration 
  --bucket my-file-uploads 
  --lifecycle-configuration '{
    "Rules": [{
      "ID": "CleanupIncompleteUploads",
      "Status": "Enabled",
      "Filter": {"Prefix": ""},
      "AbortIncompleteMultipartUpload": {
        "DaysAfterInitiation": 7
      }
    }]
  }'

CDK-level fixes

Quick wins handle the low-hanging fruit. Durable fixes belong in infrastructure-as-code.

Termination protection on all stacks

With roughly 70 CloudFormation stacks, adding termination_protection=True to each constructor individually would be error-prone. A loop in app.py handles all of them:

# Enable termination protection on all stacks
# Resolves cloudformation_stacks_termination_protection_enabled
for child in app.node.children:
    if isinstance(child, cdk.Stack):
        child.termination_protection = True

This resolved 21 CloudFormation.4 findings in a single code change.

Confused deputy prevention via CDK Aspects

IAM roles with service principals that lack aws:SourceAccount conditions are vulnerable to cross-service confused deputy attacks. Rather than modifying every role definition, a CDK Aspect injects the condition at synthesis time:

import jsii
from aws_cdk import IAspect, aws_iam as iam

@jsii.implements(IAspect)
class ConfusedDeputyPrevention:
    SERVICE_PRINCIPALS = {
        "ecs-tasks.amazonaws.com",
        "lambda.amazonaws.com",
        "states.amazonaws.com",
        "apprunner.amazonaws.com",
        "build.apprunner.amazonaws.com",
        "monitoring.rds.amazonaws.com",
        "ec2.amazonaws.com",
    }

    SKIP_ROLE_PATTERNS = {
        "BitbucketDeploymentRole",
        "AzureDevOpsDeploymentRole",
    }

    def __init__(self, account_id: str):
        self.account_id = account_id

    def visit(self, node):
        if not isinstance(node, iam.CfnRole):
            return
        node_path = node.node.path
        for pattern in self.SKIP_ROLE_PATTERNS:
            if pattern in node_path:
                return
        node.add_property_override(
            "AssumeRolePolicyDocument.Statement.0.Condition"
            ".StringEquals.aws:SourceAccount",
            self.account_id,
        )

cdk.Aspects.of(app).add(ConfusedDeputyPrevention(app_account_id))

CDK Aspects apply uniformly across all stacks in the app, including stacks added by other developers who might not know about the requirement.

Stack output secrets

The cloudformation_stack_outputs_find_secrets finding fires when a stack output value looks like an AWS access key or secret:

# WRONG — never output access key values
CfnOutput(self, "SomeSecretKey",
    value=access_key.secret_access_key.unsafe_unwrap())

# CORRECT — only output the Secrets Manager path
CfnOutput(self, "CredentialSecretName",
    value=secret.secret_name)

S3 bucket hardening in CDK

files_bucket = s3.Bucket(self, "FilesBucket",
    versioned=True,
    enforce_ssl=True,
    server_access_logs_bucket=log_bucket,
    server_access_logs_prefix="file-uploads/",
    encryption=s3.BucketEncryption.S3_MANAGED,
    lifecycle_rules=[
        s3.LifecycleRule(
            abort_incomplete_multipart_upload_after=Duration.days(7)
        )
    ],
)

What we deliberately did not fix

Not every finding should be remediated. Some represent architectural decisions:

+--------------------+--------------------+--------------------+
| Finding            | Resource           | Why Accepted       |
+--------------------+--------------------+--------------------+
| rds_instance_prote | livsyt-postgres    | Automated RDS      |
| cted_by_backup_pla |                    | snapshots already  |
| n                  |                    | enabled            |
+--------------------+--------------------+--------------------+
| ecs_containers_rea | superset-*         | Superset runs uv   |
| donly_access       |                    | pip install at     |
|                    |                    | container start,   |
|                    |                    | writes to          |
|                    |                    | /app/static/       |
+--------------------+--------------------+--------------------+
| ec2_securitygroup_ | VPN security       | VPN server         |
| internet_to_any_po | groups             | requires internet  |
| rt                 |                    | access by design   |
+--------------------+--------------------+--------------------+
| vpc_subnet_no_publ | Celery workers     | Workers need ECR   |
| ic_ip              |                    | access; private    |
|                    |                    | subnets require    |
|                    |                    | VPC endpoints      |
|                    |                    | (future sprint)    |
+--------------------+--------------------+--------------------+
| cognito_unauthenti | CloudWatch RUM     | RUM requires       |
| cated_access       |                    | unauthenticated    |
|                    |                    | Cognito access to  |
|                    |                    | collect browser    |
|                    |                    | telemetry          |
+--------------------+--------------------+--------------------+

Each accepted risk was documented with a justification and tracked in Prowler as ACCEPTED_RISK.

The scan-fix-rescan cycle

We ran three complete cycles before the audit. Each cycle took approximately one week:

Monday: Prowler scan, export findings CSV
Tuesday: Categorize by severity and SOC 2 control
Wednesday–Thursday: Execute fixes (CLI quick wins, then CDK deploys)
Friday: Re-scan to verify

The finding count dropped from 300+ to approximately 40 accepted risks by the third iteration.

Deploying by blast radius

We never deployed all CDK changes at once. We grouped them:

Week 1: Non-breaking account-level settings — EBS encryption, ECR immutability
Week 2: CDK changes that don’t affect running services — termination protection, stack output cleanup, S3 policies
Week 3: CDK changes that might restart services — TLS policy changes, container insights, log group encryption
Week 4: Changes requiring app coordination — RDS SSL enforcement, Redis TLS

Results

+--------------------+------------+--------------------+
| Metric             | Before     | After              |
+--------------------+------------+--------------------+
| Total findings     | 300+       | ~40 (accepted      |
|                    |            | risks)             |
+--------------------+------------+--------------------+
| SOC 2 controls     | 3/27 (11%) | 22/27 (81%)        |
| passing            |            |                    |
+--------------------+------------+--------------------+
| Critical findings  | 5          | 0                  |
+--------------------+------------+--------------------+
| High findings      | 45+        | 6 (all accepted    |
|                    |            | risk)              |
+--------------------+------------+--------------------+
| Time to remediate  | —          | 4 weeks            |
+--------------------+------------+--------------------+

The remaining 5 failing controls required changes outside our immediate scope — root account MFA delete, multi-region Config recorder, IAM Identity Center migration — and were addressed in a separate workstream.

Key takeaways

Categorize before you fix. Not every finding is equal. A triage framework saves weeks of effort on work that doesn’t matter.

CLI first, CDK second. Quick wins build momentum and show auditors you’re making progress. Durable fixes go in code.

CDK Aspects are force multipliers. One Aspect can fix dozens of findings across all stacks. The confused deputy prevention Aspect above covers every service principal in every stack.

Accept risk formally. Documenting why you didn’t fix something is just as important as the fix itself. Undocumented accepted risks become audit liabilities.

Batch by blast radius. Never deploy all security fixes in a single change. A failed deploy that touches 70 stacks simultaneously is a bad day.

Run the cycle until the numbers stop moving. Three iterations got us from 300+ to 40. Know when you’re done.