GitOps with Argo CD: Declarative Deployments Done Right
GitOps makes your cluster match git, automatically, and tells you when it drifts. How Argo CD works, why pull beats push, and how to roll back with a git revert.
On this page
- What GitOps actually means
- Push versus pull, and why pull wins
- Argo CD's model: the Application resource
- Repository structure: config, not rendered manifests
- Secrets: never the plaintext value, always the encrypted one
- Rollback is a git revert
- When GitOps is worth it, and when it's overkill
- Checklist to get to your first synced Application
- Further reading
The worst production incident I ever cleaned up was caused by a kubectl edit. Someone bumped a replica count by hand at 2am to ride out a traffic spike, it worked, and then everyone forgot. Three weeks later a routine deploy reset the replicas back to what the manifest in git said, the service got cut in half during peak, and we spent forty minutes staring at dashboards trying to understand why a deploy that "changed nothing" took down half the fleet.
The cluster and git disagreed, and nobody knew. That gap — between what's running and what's committed — is the entire problem GitOps exists to kill.
What GitOps actually means
GitOps is a simple claim with sharp consequences: git is the single source of truth for what runs in your cluster, and a controller continuously reconciles the live state to match git. Not "git is where we keep the YAML." Git is the desired state, full stop. If it isn't in a commit, it isn't real, and the controller will undo it.
That continuous reconciliation loop is the part people skip. A normal CI/CD pipeline runs kubectl apply once, at deploy time, and then walks away. GitOps never walks away. The controller wakes up every few minutes, diffs the cluster against git, and acts on the difference. My 2am replica edit would have been reverted within minutes, with an alert, instead of silently lurking for three weeks.
Push versus pull, and why pull wins
The traditional model is push. Your CI runner — GitHub Actions, GitLab CI, Jenkins — holds cluster credentials, builds your image, and pushes manifests into the cluster with kubectl apply or helm upgrade.
# The push model: CI reaches into the cluster
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Apply to prod
env:
KUBECONFIG_DATA: ${{ secrets.PROD_KUBECONFIG }}
run: |
echo "$KUBECONFIG_DATA" | base64 -d > kubeconfig
KUBECONFIG=./kubeconfig kubectl apply -f k8s/Look at what that requires. Your CI system holds admin credentials for production. Every PR author with the ability to edit a workflow file is one curl away from exfiltrating that kubeconfig. Your cluster's API server is reachable from your CI network. And the deploy is a fire-and-forget event — there's no record of what's actually running after the job exits, only what you told it to do once.
The pull model inverts this. A controller inside the cluster watches a git repo and pulls changes in. CI never touches the cluster. Its only job is to build an image and write a new tag into a git commit.
| Push (CI applies) | Pull (GitOps) | |
|---|---|---|
| Cluster credentials | Live in CI | Never leave the cluster |
| API server exposure | Reachable from CI | Can be fully private |
| Deploy record | Job logs, ephemeral | Git history, permanent |
| Drift detection | None | Continuous |
| Self-healing | No | Yes |
| Rollback | Re-run old pipeline | git revert |
The credential point alone sells it for me. In the pull model the cluster's API server can be private — no public endpoint, no inbound firewall rule for CI. The controller reaches out to git and your container registry, both of which are designed to be exposed. Your blast radius shrinks dramatically.
Argo CD's model: the Application resource
Argo CD (a CNCF graduated project — read argo-cd.readthedocs.io for the canonical docs) is the controller I reach for. Flux is the other strong choice; the concepts transfer, Argo CD just ships a UI that makes drift and health legible to people who don't live in kubectl.
The core abstraction is the Application: a custom resource that says "take the manifests at this path in this repo and make this namespace look like them."
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: checkout-api
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: payments
source:
repoURL: https://github.com/acme/deploy-config.git
targetRevision: main
path: apps/checkout-api/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: checkout
syncPolicy:
automated:
prune: true # delete resources removed from git
selfHeal: true # revert manual cluster changes
syncOptions:
- CreateNamespace=true
- ApplyOutOfSyncOnly=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3mTwo status fields drive everything Argo CD does:
- Sync status —
SyncedorOutOfSync. Does the live state match the manifests attargetRevision? This is the drift signal. - Health status —
Healthy,Progressing,Degraded,Missing. Are the resources actually working? Argo CD knows how to read a Deployment's rollout, a Service's endpoints, an Ingress's address, and it ships custom health checks for common CRDs.
The three flags in syncPolicy.automated are where GitOps stops being a diff tool and starts being a control loop:
prune: truemeans a resource you delete from git gets deleted from the cluster. Without it, removed manifests leak — orphaned ConfigMaps and Services pile up until someone audits.selfHeal: trueis my 2am incident's antidote. A manualkubectl editflips the app toOutOfSync, and Argo CD immediately re-applies git. You physically cannot drift for long.ApplyOutOfSyncOnly=truekeeps each sync cheap by only touching resources that actually differ, which matters a lot once an app owns a few hundred objects.
Turn selfHeal on deliberately. It is exactly the behavior you want in production and exactly the behavior that will fight you during a hands-on incident. More on that below.
Repository structure: config, not rendered manifests
The repo layout decision that bites teams later is mixing application source with deployment config. Keep them in separate repos. Your app repo holds code and a Dockerfile. A deploy-config repo holds Kustomize bases and overlays. CI builds the image and commits a new tag into the config repo; Argo CD takes it from there.
I use Kustomize overlays for environments because they keep the diff between staging and production to the handful of things that genuinely differ — replica counts, resource limits, the image tag.
# apps/checkout-api/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: checkout
resources:
- ../../base
replicas:
- name: checkout-api
count: 6
images:
- name: ghcr.io/acme/checkout-api
newTag: "2026.3.18-a1b9f04" # CI writes this line, nothing else
patches:
- target:
kind: Deployment
name: checkout-api
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: 1GiThe CI job for the app does exactly one write to this file — it updates newTag. That single-line commit is the deploy. The git history of your config repo becomes a complete, timestamped, attributed log of every production change, with a diff you can read in a PR. That audit trail is worth the setup on its own; I've answered "what changed at 14:32 UTC" with git log instead of grepping CI logs more times than I can count.
Secrets: never the plaintext value, always the encrypted one
The obvious objection: if everything lives in git, what about secrets? You do not put plaintext secrets in git. You put encrypted secrets in git, and decrypt them inside the cluster. Two mature options:
- Sealed Secrets (Bitnami). A controller in the cluster holds a private key. You encrypt with
kubesealagainst the public key and commit the resultingSealedSecret, which is useless to anyone without the cluster's key. - SOPS (Mozilla) with
ageor a cloud KMS. You encrypt values in YAML, commit the encrypted file, and an Argo CD plugin or the Argo CD Vault Plugin decrypts at sync time.
# Sealed Secrets: encrypt locally, commit the result, never the plaintext
kubectl create secret generic stripe-key \
--from-literal=api-key="$STRIPE_LIVE_KEY" \
--dry-run=client -o yaml \
| kubeseal --controller-namespace kube-system --format yaml \
> apps/checkout-api/overlays/production/sealed-stripe-key.yaml
git add apps/checkout-api/overlays/production/sealed-stripe-key.yaml
git commit -m "rotate stripe live key"This is the GitOps-shaped version of the secrets-management discipline I've written about before: the secret's lifecycle lives in git, but the plaintext only ever exists inside the cluster where the decryption key lives. For high-churn secrets, I lean toward a Vault/External Secrets setup where git holds only a reference and the live value is pulled at runtime — fewer commits, and rotation doesn't touch the deploy repo at all.
Rollback is a git revert
Here's the part that makes GitOps feel like cheating. A bad deploy is a bad commit, so a rollback is git revert.
# Production is on fire after the last deploy. Find it and undo it.
git -C deploy-config log --oneline -5
# a1b9f04 bump checkout-api to 2026.3.18-a1b9f04 <-- the bad one
# 7c3e221 raise checkout-api memory limit to 1Gi
# ...
git -C deploy-config revert --no-edit a1b9f04
git -C deploy-config push origin main
# Argo CD detects OutOfSync within ~3 min (or instantly with a webhook)
# and rolls the cluster back to the previous image tag.No special rollback tooling, no "re-run the deploy job with an older SHA," no remembering which Helm release revision was good. You revert the commit, push, and the controller converges the cluster back. And critically, the revert is itself a commit — the rollback is in the audit trail too, attributed and timestamped, instead of being an out-of-band manual action nobody recorded.
If you want it instant rather than waiting on the reconcile interval, wire a git webhook to Argo CD so a push triggers an immediate refresh. I run a ~3 minute poll as the safety net and a webhook for speed; the poll is what catches drift even when webhooks fail.
When GitOps is worth it, and when it's overkill
I don't reach for Argo CD on every project. The setup has real cost: a controller to run and upgrade, a second repo to maintain, a Kustomize or Helm structure your team has to actually understand, and a selfHeal loop that will block you mid-incident if you forget to pause the app before hot-patching. (When you genuinely need to hand-edit during an outage, argocd app set <app> --sync-policy none first, fix, then reconcile your fix back into git.)
Use this to decide:
Reach for GitOps when:
- You run more than one environment, or more than a couple of services on Kubernetes.
- Multiple people deploy, and you need an audit trail of who changed what, when.
- You want a private API server with no cluster credentials in CI.
- Drift is a real risk because people have
kubectlaccess to prod.
Skip it (for now) when:
- A single app, single environment, one or two trusted operators. A plain
kubectl applyin CI is honest and cheaper. - You're not on Kubernetes. GitOps tooling assumes a reconcilable declarative API; for raw VMs, Terraform with a remote backend and drift detection gets you most of the benefit.
- You're still pre-product-market-fit and changing your deployment shape weekly. Add GitOps when the shape stabilizes, not before.
The heuristic I use: the moment a human can change production in a way that isn't recorded in git, you've outgrown push deploys. Everything before that, GitOps is overhead you're paying for discipline you don't yet need.
Checklist to get to your first synced Application
- Create a
deploy-configrepo, separate from app code. Bases inbase/, environments inoverlays/. - Install Argo CD into an
argocdnamespace; lock down RBAC before exposing the UI. - Define one
Applicationper service-environment pair, starting withselfHeal: falseuntil you trust the manifests. - Move secrets to Sealed Secrets or SOPS before committing anything sensitive — never a plaintext
Secret, not even once, since git remembers. - Make CI's only deploy step a one-line
newTagbump committed to the config repo. - Flip
selfHeal: trueandprune: trueonce a few deploys have gone clean. - Document the rollback as
git revert <sha> && git push. Put it in the runbook. Practice it once on staging so nobody learns it during an incident.
Get those in place and your cluster stops being a thing you mutate and starts being a thing that converges. The difference shows up the first time someone runs kubectl edit at 2am and Argo CD quietly puts it back — and tells you.
Further reading
- Argo CD documentation — argo-cd.readthedocs.io
- Flux documentation — fluxcd.io
- Kustomize — kubernetes.io/docs and the Kustomize project site
- Sealed Secrets and SOPS — their respective GitHub project pages