Infrastructure as Code (IaC) Scanning Explained: Terraform, CloudFormation, Kubernetes
Misconfigured S3 buckets exposing terabytes of customer records. Kubernetes admin endpoints reachable from the public internet. IAM policies with "Action": "*" attached to roles assumed by anonymous Lambda triggers. The most expensive cloud breaches of the last five years did not start with a clever zero-day in the application code. They started with a single line in a Terraform file, a CloudFormation template, or a Kubernetes manifest that nobody reviewed before terraform apply was run. Infrastructure as Code has been the dominant cloud provisioning model for nearly a decade, but the security tooling that scans those config files for dangerous patterns is still missing from a surprising number of CI pipelines. This post explains what IaC scanning is, why it has become non-negotiable, and which tools to evaluate.
What IaC Scanning Is
IaC scanning is static analysis applied to declarative infrastructure configuration files before they are deployed. Where SAST analyzes application source code for vulnerabilities, IaC scanners parse the same files your provisioning tools read -- Terraform HCL, CloudFormation YAML and JSON, Kubernetes manifests, Helm charts, Dockerfiles, Ansible playbooks, ARM templates, Pulumi programs -- and flag patterns that indicate insecure infrastructure.
The analysis is rule-based. Each scanner ships with hundreds to thousands of policies that encode known-bad patterns: an S3 bucket without server-side encryption, a security group allowing 0.0.0.0/0 on port 22, a Kubernetes pod running with privileged: true, an RDS instance with public accessibility enabled, an IAM policy granting "Resource": "*" on sensitive actions. The scanner walks the configuration graph, evaluates each resource against the rule set, and returns findings with severity, file location, and a suggested fix.
What makes this category distinct from runtime cloud security posture management (CSPM) is the timing. CSPM tools query your cloud provider APIs and report on what is currently deployed. IaC scanners catch the same misconfigurations one phase earlier -- before the resource exists -- which is the difference between a pre-merge code review comment and an active production incident.
Why You Need It
The shift from clicking through the AWS console to declaring resources in Terraform was a productivity revolution. It also moved the security review surface from a handful of senior engineers with console access to every developer who can open a pull request. A single pull request can now provision an entire VPC, three Kubernetes clusters, and two dozen IAM roles. The blast radius of a careless commit has grown dramatically.
Misconfiguration has been the dominant root cause of cloud breaches for years. The Verizon Data Breach Investigations Report has consistently identified misconfiguration as a top contributor to disclosed incidents in the cloud category, alongside credential abuse. The IBM Cost of a Data Breach Report has tracked the per-incident cost of cloud misconfiguration breaches in the millions of dollars on average, with detection and containment timelines often exceeding six months. The pattern in these incidents is remarkably consistent: a developer adds a resource, the configuration ships, no one notices for weeks or months, and an external researcher or attacker finds the exposure first.
The economic argument for shift-left scanning is simple. Catching an over-permissive IAM role in a pull request takes a developer about thirty seconds to fix. Catching it after a breach takes the incident response team weeks. Every engineering organization that has lived through a misconfiguration incident comes out the other side with IaC scanning enforced as a merge gate. The organizations that have not had the incident yet are the ones still treating it as optional.
The Five Categories of IaC Misconfiguration
The thousands of rules in a typical IaC scanner cluster into a handful of recurring failure modes. Understanding the categories helps both in evaluating scanner coverage and in writing your own custom policies.
- Public exposure: Resources accessible from the open internet that should not be.
aws_s3_bucketwithpublic-readACL,aws_db_instancewithpublicly_accessible = true, anazurerm_storage_accountwith public blob access enabled, a KubernetesServiceof typeLoadBalancerwith no source-IP restriction. - Over-privileged IAM: Identity and access management policies that grant more than the principle of least privilege requires. An
aws_iam_policywithAction: "*"andResource: "*", a KubernetesClusterRoleBindinggrantingcluster-adminto a service account used by a single workload, a GCP service account withroles/owneron the entire project. - Missing encryption: Data stored or transmitted without the encryption controls that compliance frameworks and common sense require. EBS volumes without
encrypted = true, RDS instances withoutstorage_encrypted, S3 buckets without default server-side encryption, load balancer listeners on plain HTTP without TLS termination. - Missing logging and monitoring: Resources provisioned without the audit trail required to detect compromise. CloudTrail not enabled in all regions, VPC flow logs disabled, S3 server access logging missing, Kubernetes audit logging not configured on the API server, RDS instances without enhanced monitoring.
- Network exposure: Overly permissive network controls that expand the attack surface beyond what the workload requires. Security groups allowing
0.0.0.0/0ingress on database ports, NACLs without explicit deny rules, KubernetesNetworkPolicyresources missing entirely so all pods can communicate with all other pods, container runtimes without seccomp or AppArmor profiles.
IaC Scanning Tools: An Honest Comparison
The IaC scanning market has consolidated rapidly over the past few years through acquisitions, with several formerly independent open-source projects now living inside larger commercial platforms. The good news is that the open-source tooling remains usable and well-maintained. Here is the current state of the major options:
| Tool | Owner | License | Format Support | Notes |
|---|---|---|---|---|
| Checkov | Bridgecrew (Prisma Cloud) | Apache 2.0 | Terraform, CloudFormation, Kubernetes, Helm, ARM, Serverless, Dockerfile | Largest rule set, strong CI integration, optional commercial backend |
| Trivy | Aqua Security | Apache 2.0 | Terraform, CloudFormation, Kubernetes, Helm, Dockerfile (plus container image and SCA) | Single binary covers IaC, image, and dependency scanning; absorbed tfsec |
| KICS | Checkmarx | Apache 2.0 | Terraform, CloudFormation, Kubernetes, Helm, Ansible, Dockerfile, OpenAPI | Broad format coverage, queries written in Rego |
| tfsec | Aqua Security | MIT | Terraform only | Effectively deprecated; users are directed to Trivy, which has absorbed its rules |
| Terrascan | Tenable | Apache 2.0 | Terraform, CloudFormation, Kubernetes, Helm, Dockerfile | Rego-based policies, integrates with OPA |
For most teams starting from zero, Checkov or Trivy are the pragmatic defaults. Both have active maintainers, large rule libraries, and well-documented CI integrations. Teams already standardized on Open Policy Agent will gravitate toward KICS or Terrascan because policies can be written in Rego and reused across the broader OPA ecosystem.
Where IaC Scanning Fits in CI/CD
IaC scanning belongs at every layer of the deployment pipeline, with the strictness of the gate increasing as the code moves closer to production. A mature setup typically looks like this:
- Pre-commit hook: Lightweight scan of changed files only, surfacing the most egregious issues before the developer even pushes. Optional, advisory only -- the goal is fast local feedback, not a blocking gate.
- Pull request check: Full scan of the diff with results posted as a PR comment. Findings above a configured severity threshold block the merge. This is where the bulk of issues should be caught.
- Pipeline gate: Final scan as part of the deployment job, after the Terraform plan has been generated. Catches any drift between what was reviewed and what is about to be applied.
- Admission controller: For Kubernetes specifically, OPA Gatekeeper or Kyverno running inside the cluster as the last line of defense. Even if a misconfigured manifest somehow reaches
kubectl apply, the admission controller rejects it.
The same principles for sequencing security gates without breaking developer velocity apply here. We covered the mechanics in detail in Integrating Security Gates into CI/CD Without Slowing Down Delivery -- the short version is that the gate should be advisory until your false positive rate is low enough that engineers trust the tool, then it becomes blocking.
IaC Scanning Is Not the Whole Picture
Every IaC scan in the world will not catch a SQL injection vulnerability in your checkout endpoint. The infrastructure can be perfectly hardened -- private subnets, encrypted everything, least-privilege IAM, network policies in place -- and the application code running on top can still be trivially exploitable. Cloud misconfiguration is one attack surface. Application code is another. Third-party dependencies are a third. A complete program covers all three.
In practical terms: IaC scanning catches the infrastructure layer, SAST catches the application code layer, and SCA catches the open-source dependency layer. The three categories are complementary, and any program that covers only one is leaving the other two unattended. Our deeper dive on what static analysis catches in the application layer specifically is in GraphNode SAST, which traces data flow through the code that runs inside the infrastructure your IaC scanner is hardening.
To be clear about positioning: GraphNode is not an IaC scanner. We focus on static analysis of application source code -- Java, JavaScript, TypeScript, Python, Go -- finding the injection, deserialization, SSRF, and authentication bugs in the code itself. Pair us with Checkov or Trivy for the infrastructure layer, and Snyk, Dependabot, or Trivy again for the dependency layer. That combination gives you the three-layer coverage that a serious cloud-native security program requires.
Where to Start
Do not try to roll out IaC scanning across the entire organization in a single sprint. Pick one tool, install it locally, and run it against one repository. Triage the findings, suppress the noise, and commit a baseline. Then add the scanner as an advisory check in CI. Once developers have seen the output for a few weeks and trust the signal, flip the gate to blocking on high-severity findings only. Expand the rule set incrementally. Add more repositories. Add the admission controller for Kubernetes. The teams that succeed with IaC scanning treat it as a long-running engineering investment, not a checkbox. The ones that fail try to enable everything at once, drown in false positives, and turn the scanner off two weeks later.