Infrastructure as Code Maturity: Moving Past Your First Terraform Repository
Every IT organisation I talk to says they’re doing infrastructure as code. Most of them are being very generous with the definition. They’ve got a Terraform repository that one senior engineer maintains. Maybe it provisions some EC2 instances and a VPC. Everyone else still clicks around in the AWS console when they need something.
That’s not infrastructure as code. That’s infrastructure as someone else’s side project.
The Maturity Stages
After working with dozens of organisations on their IaC journey, I’ve identified five stages. Be honest about where you actually are.
Stage 1: Ad Hoc Scripts. Someone has written Bash scripts to automate common tasks. Maybe there’s a CloudFormation template floating around. Nothing is standardised and there’s no central repository.
Stage 2: Single-Tool Adoption. The team has standardised on Terraform or Pulumi. A central repository exists. Some infrastructure is managed through code, but engineers still make manual changes in the console for urgent fixes. State management is fragile.
Stage 3: Codified Standards. Infrastructure code follows a consistent structure with module libraries for common patterns. Pull request reviews are mandatory. CI pipelines validate plans before apply. But coverage isn’t comprehensive---some environments are still managed manually, and there’s drift.
Stage 4: Full Lifecycle Management. All infrastructure changes flow through code. Drift detection is automated. Policy-as-code tools like OPA or Sentinel enforce compliance guardrails. Cost estimation happens at plan time.
Stage 5: Self-Service Platform. Infrastructure is abstracted into self-service capabilities. Development teams provision approved patterns without writing Terraform themselves. A platform team maintains modules and guardrails.
Most organisations are solidly in Stage 2. Very few have reached Stage 4.
Why Organisations Get Stuck
The jump from Stage 2 to Stage 3 is where most teams stall.
The hero problem. One or two engineers understand the tooling deeply. Everyone else treats it as a black box. When those engineers leave, the practice atrophies. The fix is investing in training across the team---every engineer who touches cloud resources should have functional Terraform literacy.
State management fear. One bad state operation can orphan resources or corrupt your configuration. Teams burned by state issues retreat to manual management. The fix is proper state management from day one: remote backend with locking, encryption, regular backups, and clear procedures for state manipulation.
The drift problem. Someone makes a manual console change because it’s urgent. Now the infrastructure doesn’t match the code. The next Terraform plan shows unexpected changes. Engineers get nervous about running applies. The code becomes documentation rather than the source of truth.
The fix is cultural: no manual changes outside declared emergencies, and emergency changes get codified within 24 hours. Run drift detection daily and treat drift as a high-priority defect.
The Module Strategy
Good modules are the foundation of IaC maturity. Bad modules are worse than no modules.
I’ve seen organisations create modules so abstracted they’re unusable---a “create a service” module with 47 input variables that generates infrastructure nobody understands. That’s not simplification. That’s obfuscation.
Good modules should be opinionated. A database module should include your standard backup configuration, monitoring setup, and security group rules by default. Keep modules focused on one thing. Version them and publish to an internal registry. Treat major version bumps the same way you’d treat a breaking API change.
Testing Infrastructure Code
This is the biggest gap. Organisations that wouldn’t deploy application code without tests routinely push infrastructure changes with no validation beyond a Terraform plan.
At minimum: static analysis with tflint or checkov to catch security issues; plan validation on every pull request so reviewers see what will actually change; and policy enforcement through automated gates---no public S3 buckets, no unencrypted databases, no oversized instances without approval.
I worked with their consulting practice on implementing automated policy checks for a financial services client last year. The investment paid for itself within three months through reduced security findings and faster change approvals.
Where to Start
If you’re in Stage 1 or 2, don’t try to jump to Stage 5. Here’s a practical sequence:
First, get state management right. Remote backend, locking, encryption. Non-negotiable foundation.
Second, pick one non-production environment and make it fully managed by code. Get the muscle memory right where stakes are lower.
Third, build three to five core modules covering your most common infrastructure patterns. Make them good enough that engineers prefer using them over clicking in the console.
Fourth, add CI pipelines with plan output on pull requests. Make infrastructure reviews standard engineering workflow.
Fifth, measure drift and treat it as a defect. This is the cultural inflection point where IaC moves from nice-to-have to the way things get done.
Each stage takes three to six months. Don’t rush it. Infrastructure as code done badly is worse than infrastructure managed manually by competent people. The goal is to be both automated and competent---and that takes sustained commitment from IT leadership.