Shared CI/CD library

Pipeline platform

One Bitbucket pipeline library, imported by every Java and Node service. Tests live in their own repo. Promotion and reporting belong to ArgoCD.

2023 → ongoing

20 services, ~400 deploys/month

// the shape that broke

Twenty pipelines that drifted

Twenty services each shipped their own bitbucket-pipelines.yml. Same rough shape — build, test, scan, push, deploy — but each one slightly different. A change in the build pattern meant a PR to twenty repos.

A 1000-line bash pipeline reporter lived in the base image and posted to Teams at every stage. It worked. Nobody wanted to touch it.

Tests ran inside the pipeline, before pods were healthy. They were flaky and most failures weren't real.

Jira gates, Veracode and SourceClear were copy-pasted into every yaml.

// the pipeline

One library, imported by every service

The pipeline lives in two repos now: java-shared-pipeline and node-shared-pipeline. Each exports a set of Bitbucket selectors using Bitbucket's Shared Pipelines Configuration. Service repos import them by tag.

Per-service config is one file. Name, runtime, dockerfile, image repo, build commands. That's all a service author has to know about CI.

.ci/builds.yaml — the per-service surface

service:
  name: payments-api
  type: java           # java | node
  dockerfile: Dockerfile
  image:
    repository: payments-api
build:
  java:
    maven_cmd: "mvn -B -ntp test"
gitops:
  repo: platform/gitops-apps
  base_branch: main
  app_path: apps/payments-api
  strategy: kustomize

bitbucket-pipelines.yml — the import

pipelines:
  pull-requests:
    '**':
      import: java-shared-pipeline:1.4.0:feature-java
  branches:
    main:
      import: java-shared-pipeline:1.4.0:main-java

Optional gates — Veracode SAST, SourceClear SCA, Jira Fix Version validation — are env-gated in the same library. One library handles services that need them and services that don't. The difference is an env var on the import, not a fork of the pipeline.

The library is semver-tagged. Services adopt a new version on their own schedule by bumping the tag. Old tags stay around as long as anyone is still on them.

// tests, extracted

Tests are not pipeline steps

Test infra is its own repo now. The pipeline builds and pushes the image, then stops. A separate ArgoCD PostSync hook runs the test job after the deploy is actually healthy, so the tests run against the real running thing rather than half a pod.

Allure reports per run. Pass/fail published to a result store. Sentry — a small dashboard I built on top — is where you go to ask "is the fleet green?".

sentry — fleet

Eleven services green, four red. Platform Foundation across the top — cluster, Kafka, databases, secrets — lives separately from per-service test health, because "the cluster is broken" and "Data Flow has a flaky test" are different conversations. POSTSYNC and CONTINUOUS triggers are tagged so it's obvious what kind of run produced the result.

sentry — per service

Drilldown for a single service. The full Allure report is one click away; the recent runs table on the bottom makes regressions obvious without anyone having to dig into a pipeline.

// reporting & promotion

ArgoCD took over the rest

The bash reporter is gone. The bits worth keeping moved into a shared-scripts repo. The rest retired when ArgoCD's Notifications controller took over deploy reporting.

Promotion is decoupled from build. The pipeline emits .ci/out/build.json — commit, image, digest, tags, build url — and stops. ArgoCD Image Updater watches ECR and opens the GitOps bump itself when it sees a new tag.

Build does one thing. Promote does another. The pipeline doesn't know which environment its image will land in.

// architecture

How it fits together

Four layers, top to bottom: a service repo and the shared libraries it imports; a Bitbucket run that produces an image and a metadata file; an Image-Updater-driven promotion that ends in a Kubernetes deploy; and a PostSync hook that closes the loop with a test result in Sentry.

Pipeline platform — system overview

Hover any node for a one-line explanation.

// design

A few decisions worth flagging

Optional gates are env-gated, not template-forked

Some services run Veracode and Jira gates. Some don't. Both kinds use the same shared pipeline tag — the difference is an env var, not a different selector. The library stays a singleton, and the diff between two services' CI is something you can read in their .ci/builds.yaml rather than tracing through forks.

Build doesn't promote

The pipeline emits build metadata and stops. ArgoCD Image Updater handles the GitOps bump separately. The upshot: a service can't break its own deploy by misconfiguring its yaml, and a fix to promotion behaviour doesn't need a new pipeline release.

Tests are not pipeline steps

Pipelines that fail before pods are healthy lie. PostSync runs tests against the actual running deploy, and Sentry surfaces the result independently of whether anyone was watching the pipeline. Most of the "flaky test" bucket evaporated when the readiness assumption stopped being implicit.

// impact

What changed

services on one shared library

~400

deploys per month

~5 min

build time

1 file

to onboard a new service

The change I care about most isn't in the table. Onboarding a new service used to mean copy-pasting somebody else's yaml and quietly hoping. Now it's a builds.yaml and a tag. The diff between two services' CI is small, readable and intentional.

Thanks for reading.

If any of this resonates — or you want to dig into the parts I didn't write up — drop me a note. Always happy to talk shop.

Say hello

Other case studies