CI/CD: the pipeline is the release process, not a decoration on it

November 14, 2021

“CI/CD” is one of the phrases that has lost most of its meaning through overuse. It covers everything from “we run tests when you push” to “every commit is automatically deployed to production if the tests pass.” The two ends of that range are very different things, and the ambiguity is why teams who believe they “have CI/CD” sometimes ship more slowly and less safely than teams who don’t.

Worth separating the terms first. Continuous integration — Fowler’s original coinage — is the practice of merging every developer’s work to a shared mainline multiple times per day, with automated verification that the mainline stays buildable and tested. It is a branching and integration discipline, not a tool. Continuous delivery is the practice of keeping the mainline always in a deployable state, and being able to deploy it at any moment with a button push. Continuous deployment — one letter different, much bigger claim — goes further: every commit that passes CI is automatically deployed to production, no human in the loop.

Most organizations that say “we do CI/CD” mean: “we run a pipeline that includes tests, and somewhere at the end of it is a deploy.” That is fine and it is worth doing. But the hard parts — trunk-based development, fast feedback, deployable-at-any-moment mainline, safe progressive rollouts — are the discipline, and they are separable from the tool.

This post is about what the pipeline actually does, what the tools buy you, the deployment strategies that turn a green build into a live rollout, and the common ways CI/CD setups fail.

The pipeline, from one angle

A typical pipeline, at a reasonably-mature organization, is a directed graph of stages:

On push/PR: lint, type-check, unit tests, build the artifact, static analysis (SAST), dependency scanning. Cheap, fast, high-signal. Should finish in under ten minutes for any reasonable codebase; five is better.
On merge to main: re-run the fast checks, run the integration tests, build the immutable artifact (Docker image tagged with the commit SHA, package version, etc.), push it to the registry.
Deploy to dev/staging: apply the artifact to a test environment. Run end-to-end tests, smoke tests, contract tests against downstream services. On green, promote the artifact as “ready for prod.”
Deploy to prod: apply to production, either automatically (continuous deployment) or on a human-pushed button (continuous delivery). Use a progressive rollout — canary, blue/green — rather than all-at-once. Watch the SLIs. Roll back automatically if they regress.

The important property, pre-any-tool: the same artifact flows through all stages. The image that got tested in staging is the image that gets deployed to prod. You do not rebuild between environments; you promote. A different build means a different artifact means a different set of risks than what was tested.

Teams that rebuild per environment — “the prod pipeline builds from main again” — are almost always wrong about this. Race conditions, dependency version drift, base image updates, builder environment differences: all of these creep in. The artifact is the unit of promotion. Build once, deploy many.

The tools

GitHub Actions (GitHub, 2019). Tightly integrated with GitHub; the default for anything already on GitHub. Workflows are YAML, triggered by events (push, PR, schedule, manual, external). A large ecosystem of pre-built actions. Runners are GitHub-hosted by default (with limits and costs) or self-hosted (for access to internal systems, specific hardware, or cost control at scale). The dominant choice in 2026 for new projects on GitHub.

GitLab CI (GitLab, 2012-ish). Same tight integration but with GitLab. The YAML syntax is different; the model is similar. Traditionally stronger than early GitHub Actions on some axes (child pipelines, dynamic pipelines, merge trains); GitHub has caught up on most. If you are on GitLab, use it.

Jenkins (originally Hudson, 2005). The elder of the space, still deployed widely. An open-source Java server with a plugin ecosystem so large that Jenkins can do almost anything and almost nothing well by default. Pipelines are written in Groovy (Jenkinsfile). Strengths: extensibility, flexibility, runs anywhere. Weaknesses: operational burden, plugin version hell, UI from an earlier era, resource hungry. Organizations on Jenkins tend to stay on Jenkins because migrating is expensive; new projects rarely choose it.

CircleCI, Buildkite, Drone, Argo Workflows, Tekton, and a dozen others exist. Buildkite is interesting for its hybrid model (SaaS control plane, customer-hosted agents); Tekton and Argo sit in the Kubernetes-native category; CircleCI has been around long enough that some teams chose it before GitHub Actions existed and never moved.

CD-specific tools — ArgoCD, Flux, Spinnaker, Harness — focus on the deployment half rather than the build half. They sit downstream of the CI system and handle the “apply this artifact to this environment with this rollout strategy” part, often with GitOps semantics (the state of the cluster is synced from a Git repo). For Kubernetes-heavy estates, a dedicated CD tool is usually worth the incremental complexity.

The meaningful dividing line is not between tools. It is between pipelines defined in the repo (GitHub Actions workflows, GitLab CI files, modern Jenkinsfile) and pipelines defined in the tool (classic Jenkins jobs, TeamCity jobs). The first model — pipeline as code, versioned with the code it builds — is the one everything is converging on and the one worth standardizing on.

What makes a CI pipeline good

The first quality: speed to feedback. The longer a pipeline takes to tell you it failed, the less useful it is. Engineers context-switch away, come back an hour later, debug from cold, miss the PR review window. A pipeline that takes two minutes to fail is a pipeline you actually use; a pipeline that takes thirty is one you work around.

The second: reliability. A flaky pipeline trains the team to re-run rather than investigate. Flakiness poisons the signal. A failure that is real-problem 80% of the time and random-flake 20% is worse than no pipeline — it teaches people to ignore failures. Flake rate is a metric worth tracking and actively reducing.

The third: reproducibility. A build that works on the CI server and fails on a developer’s machine is already a problem; the reverse is worse. Docker builds, pinned dependencies, language lockfiles, and hermetic build tools (Bazel, Nix) all help. The goal is that a given commit + a given Dockerfile + a given base image produces bit-for-bit the same artifact every time.

The fourth, more organizational: one path to production. Every artifact that reaches prod goes through the same pipeline. Manual deploys, emergency patches applied out-of-band, “just this once” exceptions — each one is a gap in the safety net, and the gaps accumulate. The pipeline is the process; the process is whatever the pipeline actually does.

Caching, parallelism, and the speed problem

Most slow pipelines are slow for the same reasons:

Dependency installation on every run. npm install, pip install, bundle install — each can take minutes. Cache the dependency store keyed on the lockfile hash. Restore the cache at the start of the job; re-populate if the lockfile changed.
Docker builds from scratch. Docker’s layer cache is your friend; BuildKit’s registry-backed cache (--cache-from / --cache-to to a registry) lets the CI job benefit from the cache produced by the previous run. Multi-stage builds reduce the total work.
Tests running serially. Split the test suite into shards; run them in parallel. Unit tests are typically embarrassingly parallel; integration tests less so but still fan out usefully.
Everything waiting on everything. Pipelines as linear sequences of “lint, then type-check, then unit test, then build” waste clock time. Model the pipeline as a DAG: the independent things run concurrently, the dependent things wait.

A well-tuned pipeline for a mid-sized codebase can get from commit to green in under five minutes. The difference between five minutes and twenty-five is usually not “we need better hardware”; it is caching and parallelism that was never set up.

Deployment strategies

Once the pipeline produces an artifact and the tests pass, you have to put it in production. The shape of that transition is the single biggest determinant of how much pain a deploy causes.

Recreate. Stop the old version; start the new one. Downtime. Cheap to understand, cheap to implement, unacceptable for anything user-facing. The cases it is reasonable: batch jobs, singleton processes that cannot safely run two versions at once, internal tools nobody is using at deploy time.

Rolling update. Replace instances one batch at a time. Some capacity of old + some capacity of new run in parallel during the rollout. No downtime if the two versions are compatible. This is Kubernetes’ default. Rollback is another rolling update in reverse, which takes minutes, not seconds. The constraint that bites people: the old and new versions must both work during the rollout. Schema/API evolution that is not backward-compatible breaks this; that’s why expand-contract is a prerequisite.

Blue/green. Run two full copies. “Blue” is the current prod; “green” is the new version, running with zero traffic. When green is verified, flip the router — all traffic moves to green. Rollback is instant: flip back. The cost is that you run two full copies during the cutover. Good for “I want to verify the new version is healthy before it sees real traffic.”

Canary. Route a small percentage of traffic to the new version, watch the SLIs, gradually ramp up if clean, roll back if not. The percentage can ramp on a schedule (1% → 10% → 50% → 100% over an hour) or on metric health (promote when error rate and latency stay within SLOs). This is the strategy that scales best to “risky changes to a system where a full rollback takes minutes but a bad version for minutes matters.” Done well with Istio, Linkerd, Argo Rollouts, or Flagger.

Feature flags as deployment strategy. Deploy the code dark — every instance has the new code but the feature is off behind a flag. Flip the flag for some users, watch, ramp up. Decouples deploy from release. Used heavily by LaunchDarkly-style systems, and it’s the right primitive for any product team that ships continuously.

The matrix is: how much capacity am I willing to double, how quickly do I need to roll back, how safe am I against a bad version running for N minutes against some users? Canary with automated promotion on healthy SLIs is the current best-in-class; everything else is a simpler point on the same continuum.

Continuous deployment vs continuous delivery

The difference is one human: does a person push the button to send the artifact to prod, or does the pipeline do it automatically?

The case for continuous deployment (no human, automatic promotion):

It removes the gap between “merged” and “live.” Small batches, fast feedback, the thing you just wrote is the thing you are watching in prod.
It forces every other discipline — tests, rollouts, feature flags, observability — to be good enough that no human gatekeeping is needed. Teams that reach continuous deployment did so by investing in the rest of the stack.
The alternative — a human reviews the change before deploy — is rarely a real review. It’s a pause, and pauses accumulate. Over time, the pause grows into a nightly deploy window, a weekly release train, a quarterly big-bang. All of these make changes larger and therefore riskier.

The case for continuous delivery (human pushes the button):

Some changes should have a human in the loop. Regulated industries, risky subsystems, changes that coincide with external events.
It is the honest answer when the rest of the stack isn’t ready. A team without good tests, good canary tooling, good observability, and good feature flags should not automate deploys; they should improve the rest first.

Most mature teams land on: continuous deployment for typical changes, continuous delivery for high-risk ones, with the classification made per-change via feature flags or gated pipelines. “Deploy automatically if green” is the default; “wait for approval” is the exception someone has to specifically declare.

GitOps

GitOps is the idea that the state of production should be declared in a Git repo, and a controller should continuously reconcile production to match. The repo is the source of truth; the controller (ArgoCD, Flux) watches for changes and applies them. Pushes bypass the controller; the controller re-applies the repo’s state. Drift heals itself.

The workflow: CI builds the artifact, updates the image tag in the config repo, opens a PR (or commits directly to a deploy branch). The controller sees the change, applies it to the cluster. The cluster state is always reconstructible from the repo.

The appeal is operational: a unified model of “the current state of prod is whatever is in the repo”; rollbacks are git reverts; change history is commit history; auditing is git log. The appeal is also philosophical: the desired state is separated from the observed state, and the difference is the controller’s problem.

The limits: not everything fits cleanly. Secrets that rotate frequently, state that comes from outside the repo (backups, user data), operations that are events rather than states — these need additional machinery. But for the steady-state cluster-configuration-plus-application-deploys loop, GitOps is a good fit and increasingly the default in Kubernetes-heavy organizations.

Common failure modes

The green pipeline that doesn’t test anything. Tests pass because they’re trivial, or because the important checks were commented out months ago. “The pipeline is green” is a statement about the pipeline, not about the code. Audit what the pipeline actually runs — it usually has more fossils than you think.

Branch-based builds with different config. main builds differently from feature branches; staging deploys from a different source than prod. The artifacts diverge. A bug that didn’t reproduce in staging did reproduce in prod because they weren’t the same code. One pipeline, one artifact, flowed through environments.

Manual deploys as the “real” path. A pipeline exists but the deploys everyone trusts are done by hand, from a runbook, by a specific engineer. The pipeline is decoration. Fix: use the pipeline for the emergency deploy first. Make it so the pipeline is the only path that works.

The “fix in prod” culture. Something’s broken; an engineer SSHes in, edits a config, restarts. Not in Git. Not in the pipeline. Next deploy undoes the fix; nobody remembers the history. Fix: make hot-fixes a pipeline invocation, not an SSH session. This requires a pipeline that can do an emergency deploy in minutes, which is its own kind of discipline.

The pipeline that approves itself. Auto-merge + auto-deploy

no reviewers = the build system has commit rights to prod, mediated by a YAML file anyone can edit. Fix: require reviewers on PRs that touch the pipeline config; require reviewers on PRs that touch production deployments; audit who has access.

Secrets in pipeline logs. Pipeline variables printed in a debug step. Environment variables dumped in an error path. set -x turned on at the wrong moment. The pipeline’s logs are a surface; a rotated secret doesn’t retroactively clean the audit log. Fix: pipeline-native secret masking, plus a review of what gets logged.

Rollback that doesn’t work. A pipeline that deploys but has no practiced rollback. The first real-prod rollback is the one you find out your tooling didn’t handle. Fix: practice rollbacks. Make “deploy the previous version” a one-button operation. Game-day it occasionally.

The rule

A CI/CD pipeline is the release process, mechanized. What the pipeline does is what your team does. If the pipeline tests thoroughly, builds a single artifact, flows it through environments, and deploys it with a progressive strategy, then the team ships safely; if the pipeline is a rubber stamp, then the team ships blind.

The tool matters less than the discipline. GitHub Actions, GitLab CI, Jenkins, ArgoCD, Spinnaker — all can produce a good pipeline; all can produce a bad one. The properties worth building for are the same regardless: fast feedback, reliable signal, reproducible artifacts, one path to prod, safe rollouts, easy rollbacks. Get those right and the tool is almost incidental. Get them wrong and the tool is a Jenkins with a thousand plugins no one maintains.

The discipline composes with the rest. CI/CD is downstream of testing strategy, IaC for the environments it deploys to, schema/API evolution for what can be rolled out safely, observability for knowing whether a rollout is healthy. A healthy pipeline is the point at which all of those disciplines meet and either reinforce each other or expose each other’s gaps. Building it well is a good forcing function for the rest.