Infrastructure as Code with Terraform: A Pragmatic Start
Click-ops doesn't scale and doesn't review. A pragmatic introduction to Terraform, state, modules, and the gotchas that bite teams adopting IaC.
On this page
A production database got deleted on a Friday afternoon because someone clicked the wrong "Delete" button in the AWS console, two rows above the staging instance they meant to remove. There was no diff to review, no approval gate, no record of intent. Just a confirmation dialog that everybody learns to dismiss on autopilot. That is click-ops, and it is the reason I push every team I work with toward infrastructure as code before we ship anything serious.
The pitch for IaC is not "automation" in the abstract. It is three concrete properties you cannot get from a web console:
- Reproducibility. The same code produces the same infrastructure, in
us-east-1andeu-west-1, on day one and on day four hundred. - Code review. A change to your VPC routing shows up as a
git diffthat a second engineer reads before it touches anything real. - Drift detection. When someone fat-fingers a security group rule in the console at 2am,
terraform plantells you the next morning.
Terraform is the most pragmatic on-ramp to all three. Let me walk through how it actually works, and the parts that bite teams who skip the fundamentals.
The plan/apply loop and the state file
Terraform's mental model is small. You declare what you want in HCL (HashiCorp Configuration Language). Terraform compares your declaration against what it believes currently exists, and produces a plan: a list of resources to create, update, or destroy. You review that plan. You apply it.
Here is the smallest useful example — an S3 bucket on AWS, with versioning turned on.
terraform {
required_version = ">= 1.9"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.70"
}
}
}
provider "aws" {
region = "eu-west-1"
}
resource "aws_s3_bucket" "assets" {
bucket = "acme-prod-assets-2025"
tags = {
Environment = "production"
ManagedBy = "terraform"
}
}
resource "aws_s3_bucket_versioning" "assets" {
bucket = aws_s3_bucket.assets.id
versioning_configuration {
status = "Enabled"
}
}Three pieces of vocabulary do all the work. A provider is the plugin that knows how to talk to a platform — hashicorp/aws here, but there are providers for GCP, Azure, Cloudflare, Datadog, GitHub, and roughly four thousand others in the registry. A resource is a single managed object: a bucket, a subnet, a DNS record. The reference aws_s3_bucket.assets.id is an implicit dependency — Terraform builds a dependency graph from these references and creates the bucket before it tries to configure versioning on it.
The workflow:
terraform init # download providers, configure the backend
terraform fmt # canonical formatting — wire this into CI
terraform validate # catch syntax and type errors locally
terraform plan # show me what would change
terraform apply # do it, after I confirmThe thing that confuses people new to Terraform is state. Terraform records what it created in a terraform.tfstate file — a JSON document mapping your resources to real-world IDs. State is how Terraform knows that aws_s3_bucket.assets is the bucket acme-prod-assets-2025 and not something it needs to create again. Without state, every apply would be a fresh start.
State is also the single most dangerous thing in Terraform, for two reasons. First, it is sensitive. Database passwords, generated secrets, and private keys land in state in plaintext, even when you never write them to disk yourself. Treat the state file like a credentials vault, because that is what it is. Second, two people applying against the same state at the same time will corrupt it. This is the killer argument against local state on a team.
Remote state with locking
Local state — a terraform.tfstate on someone's laptop — works for a solo experiment and fails the moment a second engineer joins. Whoever applies last silently overwrites the other's record of reality. The fix is a remote backend with locking. On AWS, the classic combination is S3 for storage plus DynamoDB for a lock.
terraform {
backend "s3" {
bucket = "acme-terraform-state"
key = "prod/network/terraform.tfstate"
region = "eu-west-1"
dynamodb_table = "acme-terraform-locks"
encrypt = true
}
}When you run apply, Terraform writes a lock item to DynamoDB. A second apply against the same key blocks until the first releases the lock. The encrypt = true flag turns on server-side encryption for the state object — non-negotiable given what is in there. Enable bucket versioning on the state bucket too, so a botched apply is recoverable.
One note for 2026: recent AWS provider versions support native S3 state locking via a use_lockfile = true option, which uses S3 conditional writes and lets you drop the DynamoDB table entirely. I still reach for the DynamoDB table on existing setups because it is battle-tested and the migration buys little, but for greenfield projects the lockfile approach is one fewer resource to manage. Check the backend docs for the version you are on.
Lock your state bucket down with a bucket policy and IAM. Anyone who can read it can read your secrets.
Variables, outputs, and modules
Hardcoding values is fine until you need a second environment. Variables and outputs are the seams that make configuration reusable.
variable "bucket_name" {
description = "Globally unique name for the assets bucket"
type = string
}
variable "versioning_enabled" {
description = "Whether to enable object versioning"
type = bool
default = true
}
output "bucket_arn" {
description = "ARN of the created bucket, for use by other stacks"
value = aws_s3_bucket.assets.arn
}Outputs are how stacks talk to each other and how CI surfaces values like an ARN or an endpoint. Variables are inputs.
A module is just a directory of .tf files you call from elsewhere. The moment you copy-paste a block of HCL, you want a module instead. Here is a small reusable one for a versioned, encrypted bucket — the pattern I reach for constantly.
# modules/s3-bucket/main.tf
variable "name" {
type = string
}
variable "tags" {
type = map(string)
default = {}
}
resource "aws_s3_bucket" "this" {
bucket = var.name
tags = var.tags
}
resource "aws_s3_bucket_public_access_block" "this" {
bucket = aws_s3_bucket.this.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_bucket_versioning" "this" {
bucket = aws_s3_bucket.this.id
versioning_configuration {
status = "Enabled"
}
}
output "arn" {
value = aws_s3_bucket.this.arn
}Calling it stays tidy, and the secure defaults — public access blocked, versioning on — come for free every time:
module "logs_bucket" {
source = "./modules/s3-bucket"
name = "acme-prod-logs-2025"
tags = { Environment = "production" }
}A practical rule I follow: keep modules small and composable, not god-modules that provision an entire platform behind forty input variables. A module that does one thing — a bucket, a VPC, a service — is easy to review, test, and reason about. The registry at registry.terraform.io has well-maintained modules for common building blocks like terraform-aws-modules/vpc, and they are worth reading even if you write your own.
Workspaces vs. separate state per environment
You will eventually ask: how do I run prod and staging from the same code? Terraform offers workspaces — multiple named states behind one backend config. They are tempting because they are one command (terraform workspace new staging). I recommend against them for environment separation, and so does HashiCorp's own guidance.
| Approach | Isolation | Blast radius | Config drift risk | Good for |
|---|---|---|---|---|
| Workspaces | Same backend, same credentials | Easy to apply to the wrong env | High — same code, env conditionals creep in | Ephemeral/per-developer copies |
| Separate state per env | Distinct backend key, distinct credentials | Hard to cross the streams | Low — explicit per-env config | Production environments |
The failure mode with workspaces is mundane and brutal: you forget which workspace you are in, run apply, and reshape production while thinking you are in staging. Separate directories with separate backend keys and separate IAM roles make that mistake structurally hard. The small extra ceremony is the whole point. Save workspaces for throwaway, per-developer stacks where a mistake costs nothing.
Secrets, drift, and importing what already exists
Never hardcode secrets in HCL. They end up in Git history and in state. Pull them from a secret manager at apply time instead:
data "aws_secretsmanager_secret_version" "db" {
secret_id = "prod/db/password"
}
resource "aws_db_instance" "main" {
# ...
password = data.aws_secretsmanager_secret_version.db.secret_string
}The value still lands in state — that is unavoidable when Terraform manages the resource — which is exactly why your remote state must be encrypted and access-controlled. The win is that the secret is not in your repo.
Drift is when reality diverges from your code: someone edited a security group in the console. Run terraform plan (or terraform plan -refresh-only to see drift without proposing changes) and Terraform shows you the delta. The discipline that makes this work is simple — once a resource is in Terraform, you stop touching it in the console. Pick one source of truth.
Importing existing resources is how you adopt Terraform on infrastructure that already exists without recreating it. Write the resource block, then bring the real object under management. Modern Terraform supports declarative import blocks, which generate the configuration and are far less error-prone than the old terraform import CLI command:
import {
to = aws_s3_bucket.legacy
id = "acme-legacy-bucket-2019"
}Run terraform plan -generate-config-out=generated.tf and Terraform writes a starting config for you to clean up. This is how you migrate a click-ops estate into code incrementally instead of in one terrifying big bang.
A few more gotchas worth internalizing:
- Destroy ordering. Terraform tears resources down in reverse dependency order, but circular or implicit dependencies it cannot see (an IAM policy referenced by ARN string instead of by resource attribute) cause destroys to fail or hang. Reference resources by attribute, not by hardcoded string, so the graph stays correct.
- Partial applies. If an
applyfails halfway — an API timeout, a quota limit — some resources exist and some do not, but state reflects only what completed. Re-runningplanreconciles it. Do not panic and start deleting things by hand; let Terraform converge.
When Terraform, and when not
Terraform is not the only option, and I would not pretend it is the best for every team.
| Tool | Language | Best fit |
|---|---|---|
| Terraform / OpenTofu | HCL (declarative) | Broadest provider support; teams that want config, not code; multi-cloud |
| Pulumi | TypeScript, Python, Go, C# | Teams that want loops, conditionals, and real abstractions in a familiar language |
| AWS CDK | TypeScript, Python, etc. | AWS-only shops already living in CloudFormation |
Reach for Pulumi or CDK when your infrastructure logic is genuinely complex — dynamic fan-out over hundreds of accounts, heavy conditional generation — and your team would rather express that in a real programming language than fight HCL's for_each and dynamic blocks. Reach for Terraform when you want the largest ecosystem, multi-cloud reach, and configuration that reads like configuration. One note: HashiCorp's 2023 license change to the BSL spawned OpenTofu, an MIT-licensed fork under the Linux Foundation that remains a drop-in replacement for the workflows in this post. If licensing matters to you, it is a real option.
A starting checklist
If you are bringing a team onto Terraform, do these in order:
- Remote state from day one. S3 + DynamoDB (or native S3 locking). Never local state on a team.
- Encrypt state and lock it down. It holds secrets. Versioning on, tight IAM.
- Separate state per environment via distinct backend keys and IAM roles. Skip workspaces for prod.
fmt,validate, andplanin CI on every PR, with the plan posted to the pull request for review.- Secrets from a manager, never in HCL. Accept that they live in encrypted state.
- Small, single-purpose modules. No god-modules.
- Adopt existing infra via
importblocks, incrementally. Then stop touching the console.
The console will always be faster for the first thing you build. It is the tenth change, reviewed by the third engineer, in the fourth region, that pays back every minute you spent writing HCL. Start there.
Further reading
- Terraform documentation — developer.hashicorp.com/terraform
- Terraform Registry (providers and modules) — registry.terraform.io
- OpenTofu — opentofu.org
- AWS provider documentation, on the Terraform Registry