GitOps
Status: Complete
Category: Infrastructure
Default enforcement: Soft
Author: PushBackLog team
Tags
- Topic: infrastructure, delivery, automation
- Skillset: devops, engineering
- Technology: git, Kubernetes, Flux, ArgoCD
- Stage: delivery, operations
Summary
GitOps is an operational model where the desired state of infrastructure and applications is declared in Git, and automated tooling continuously reconciles the running system to match that declared state. Git becomes the single source of truth: a merge to the configuration repository triggers a deployment; a drift between the live system and the Git state is automatically corrected. GitOps provides a full audit trail, easy rollback, and removes the need for engineers to have direct write access to production systems.
Rationale
Git as the audit trail for infrastructure
When engineers apply changes to infrastructure directly — via kubectl apply, terraform apply, or clicking in a cloud console — the changes are ephemeral. There is no permanent record of who changed what, when, and why. The next engineer to look at the system may have no idea why a particular configuration exists.
GitOps makes every infrastructure change a git commit: author, timestamp, description, and code review are all captured by the version control system. The blame trail is as clear as for application code.
Automated reconciliation prevents drift
In traditional ops, infrastructure can drift from its intended state: someone applies a hot-fix directly in production and forgets to update the configuration repository. The next deployment overwrites the fix. With GitOps, a reconciliation loop continuously compares the live state to the declared state, detects drift, and corrects it automatically. Infrastructure becomes self-healing.
Rollback is a git revert
The rollback procedure for a bungled deployment becomes git revert <commit> — reversing the state declaration and triggering the automation to restore the previous state. No manual steps, no tribal knowledge required.
Guidance
Core GitOps principles
- Declarative: the system is described as desired state, not as a sequence of imperative steps
- Versioned and immutable: the state is stored in git; history is never rewritten
- Pulled automatically: the agent (Flux, ArgoCD) pulls state from git, not pushed from CI
- Continuously reconciled: the agent detects and corrects drift from the declared state
Pull-based vs push-based deployments
| Approach | Mechanism | GitOps? |
|---|---|---|
Push: CI pipeline runs kubectl apply | CI has direct cluster access | Partial |
| Pull: GitOps operator polls git repository | Operator has cluster access; CI only writes git | True GitOps |
True GitOps uses a pull model: the CI pipeline writes manifests to a git repository; the GitOps operator (Flux or ArgoCD) polls the repository and applies changes. This means CI never has direct cluster credentials.
ArgoCD example
# argocd/application.yaml — defines an ArgoCD Application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: api-service
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/k8s-config
targetRevision: main
path: apps/api-service/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Remove resources no longer in git
selfHeal: true # Correct drift automatically
syncOptions:
- CreateNamespace=true
With selfHeal: true, if someone runs kubectl edit in production and changes a value, ArgoCD will detect the drift and revert it within minutes.
Repository structure
Two common approaches:
Monorepo: application code and kubernetes manifests in the same repository
app/ # Application source
k8s/
base/ # Base Kubernetes manifests (Kustomize base)
overlays/
staging/ # Staging-specific patches
production/ # Production-specific patches
Split repo: separate repository for infrastructure/config
myapp-config/ # Config-only repository
apps/
api/
base/
overlays/
infrastructure/
ingress/
cert-manager/
Split repos are preferred for larger teams: application PRs and infrastructure PRs have separate approval workflows.
Flux example (Reconciling a HelmRelease)
# flux/helmrelease.yaml
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: api-service
namespace: production
spec:
interval: 5m # Reconcile every 5 minutes
chart:
spec:
chart: api-service
version: '>=1.0.0'
sourceRef:
kind: HelmRepository
name: myorg-charts
values:
image:
tag: "v1.4.2"
replicas: 3
Promotion workflow
feature → main → [CI builds image, updates image tag in config repo] → GitOps operator applies to staging
↓ (manual PR approval)
production config repo ← PR to bump image tag
↓ (merged by engineer)
GitOps operator applies to production
By separating image builds (CI) from deployment (GitOps pull), each environment’s state is always auditable in git and promotions require an explicit git change.
Review checklist
- Infrastructure desired state is declared in git
- A GitOps operator (Flux, ArgoCD) handles apply — CI does not call
kubectl applydirectly to production - Drift detection and self-healing are enabled in production
- The config repository requires PR approval before changes merge to the production branch
- Rollback procedure is documented:
git revert+ merge