Back to Projects

CI/CD & GitOps Platform Engineering

Building production-grade CI/CD infrastructure from scratch—evolving a greenfield microservices platform to 400 deploys/month with enterprise security

2023-2025 (2-year evolution)
20 Microservices • 400 Deploys/Month

The Challenge

I joined as a Graduate QA Engineer at the very beginning of a massive architectural transformation: migrating from a legacy monolith to a modern Kubernetes-based microservices platform.

With no existing CI/CD infrastructure and a blank canvas, I was tasked with designing and implementing the entire deployment pipeline that would serve 20+ microservices across 4 environments (Dev, QA, Pre-Prod, Production).

Initial Requirements

  • Zero existing infrastructure — build everything from scratch
  • Multi-environment strategy — Dev, QA, Pre-Prod, Prod with different promotion rules
  • Security-first approach — SAST/SCA scanning, compliance gates
  • Developer productivity — abstract complexity, create "paved path"
  • GitOps-native — declarative deployments with ArgoCD

The Evolution

Over 2 years of iterative platform engineering, I transformed a basic build script into a sophisticated, production-ready CI/CD platform. Here's how it evolved:

Phase 1: Foundation (Months 1-6)

Built the initial pipeline architecture with basic Maven builds, Docker image creation, and manual ECR pushes.

# Early Pipeline (bitbucket-pipelines.yml)
image: maven:3.9.6
pipelines:
  branches:
    master:
      - step:
          name: Build, Test, and Push Docker Image
          script:
            - mvn clean install
            - mvn test
            - docker build -t $IMAGE_NAME:$IMAGE_TAG .
            - pipe: atlassian/aws-ecr-push-image:2.2.0

      - step:
          name: Deploy to Dev
          script:
            - git clone git@bitbucket.org:company/argocd-apps.git
            - ./scripts/update_tag.sh  # Update deployment manifest
            - git push origin master

Key Achievement: Established GitOps pattern with ArgoCD for declarative deployments

Phase 2: Standardisation & Reusability (Months 6-12)

Created reusable pipeline components and custom base images to eliminate duplication across services.

  • Custom Base Image — Pre-built Maven image with kubectl, ArgoCD CLI, AWS CLI, jq
  • Shared Pipeline Scripts — `pipeline_reporter.sh` for centralised notifications
  • Kustomize Overlays — Environment-specific configuration management
  • App-of-Apps Pattern — Hierarchical ArgoCD application management

Innovation: Built custom Docker base image reducing pipeline config by ~40% and ensuring consistency

Phase 3: Security & Quality Gates (Months 12-18)

Integrated enterprise security scanning and quality gates directly into the pipeline.

SAST (Veracode)

Static code analysis blocking critical/high vulnerabilities

SCA (SourceClear)

Dependency vulnerability scanning with automated reporting

# Security Scans in Parallel (Pull Requests)
pipelines:
  pull-requests:
    '**':
      - step: *unit-tests
      - parallel:
          - step: *integration-tests
          - step: *security-scan      # SCA
          - step: *static-analysis    # SAST
          - step: *veracode-ui-upload

Phase 4: Test Orchestration & Observability (Months 18-24)

Migrated from brittle pipeline-based testing to robust ArgoCD PostSync hooks with comprehensive observability.

  • PostSync Hook Architecture — Tests run AFTER deployment completes (eliminated race conditions)
  • Automated Test Suites — Newman (API), Cucumber (BDD) packaged as Kubernetes Jobs
  • S3 Integration — Test results uploaded for historical analysis and compliance
  • Teams Notifications — Rich webhook messages with test status, deployment info, and ArgoCD links
  • DORA Metrics — Built custom collector tracking deployment frequency, lead time, MTTR

Business Impact: Reduced false test results by ~90% and provided deployment visibility to entire org

CI/CD & GitOps Architecture

DevCommitBitbucketPipelinesTests • Build • DeployParallel TestsUnit • IntegrationSAST • SCADocker BuildCustom Base ImageAWS ECRContainer RegistryGitOps RepoKustomize Overlaysdev • qa • preprod • prodAuto: dev, qaArgoCDGitOps SyncApp-of-Apps PatternKubernetesEKS Cluster4 EnvironmentsPostSync TestsNewman API TestsCucumber BDDTeamsAdaptive CardsRich NotificationsPipeline ReporterBash Script (1000+ lines)Baked into Base Imagegit pushbuildpushupdateconfig updatedsyncdeploytest

Flow: Developer commits → Bitbucket runs parallel tests → Docker builds with custom base image → Pushes to ECR → Updates GitOps repo → ArgoCD syncs → Deploys to K8s → PostSync tests run → Teams notifications at each stage

Current State: Production-Grade Platform

Today's pipeline is a fully standardised, production-ready CI/CD platform supporting 20 microservices with sophisticated testing, security, and deployment automation.

Pipeline Architecture

Parallel Test Execution: Unit, integration, SAST, SCA run concurrently (~5min total)
Automated Deployments: Master → Dev (auto), QA (auto w/ Jira gate), Pre-Prod (manual), Prod (manual)
Jira Integration: Fix Version validation prevents premature QA deployments
Reusable Components: Custom base image, shared scripts, standardised Kustomize overlays
PostSync Testing: Kubernetes Jobs execute tests after ArgoCD sync completion
# Modern Pipeline (Standardized for 20 Services)
image: 123456789012.dkr.ecr.region.amazonaws.com/company/pipeline-base-image:v1.2

definitions:
  steps:
    unit-tests: &unit-tests
      name: "Unit Tests"
      script:
        - mvn -B clean install -Pfast-tests
        - pipeline_reporter.sh test_failure ci "Unit Tests"

    integration-tests: &integration-tests
      name: "Integration Tests"
      size: 2x
      script:
        - mvn -B clean verify -Pintegration-tests
        - pipeline_reporter.sh test_failure ci "Integration Tests"

    security-scan: &security-scan
      name: "Software Composition Analysis"
      script:
        - curl -sSL https://download.sourceclear.com/ci.sh | sh
        - pipeline_reporter.sh security_alert pr "Critical vulnerabilities"

    static-analysis: &static-analysis
      name: "Static Analysis"
      script:
        - java -jar pipeline-scan.jar -vid "$VERACODE_API_ID"
        - pipeline_reporter.sh security_alert pr "Code vulnerabilities"

pipelines:
  pull-requests:
    '**':
      - step: *unit-tests
      - parallel:
          - step: *integration-tests
          - step: *security-scan
          - step: *static-analysis

  branches:
    master:
      - step: *unit-tests
      - step:
          name: "Docker Build & Push"
          script:
            - docker build -t $IMAGE_NAME:$IMAGE_TAG .
            - pipe: atlassian/aws-ecr-push-image:2.4.2

      - step:
          name: "Deploy to Dev"
          script:
            - git clone https://x-token-auth:$TOKEN@bitbucket.org/company/argocd-apps.git
            - ./scripts/update_kustomization.sh dev "$IMAGE_TAG"
            - pipeline_reporter.sh deploy_success dev

      - step:
          name: "Jira Fix Version Check - QA Gate"
          script:
            - jira_fix_version_check.sh  # Validates Jira ticket status

      - step:
          name: "Deploy to QA"
          script:
            - ./scripts/update_kustomization.sh qa "$IMAGE_TAG"
            - pipeline_reporter.sh deploy_success qa

  custom:
    preprod-deploy:
      - variables:
          - name: IMAGE_TAG
      - step:
          name: "Deploy to PreProd"
          script:
            - ./scripts/update_kustomization.sh preprod "$IMAGE_TAG"

    prod-deploy:
      - variables:
          - name: IMAGE_TAG
      - step:
          name: "Deploy to Production"
          script:
            - ./scripts/update_kustomization.sh prod "$IMAGE_TAG"

Business Impact

400+

Deployments per month across all environments (verified via custom DORA metrics collector)

~5 min

Consistent build time with parallel test execution—optimised through iterative improvements

20

Microservices using standardised pipeline—"paved path" reduces onboarding to <1 day

Zero

Production incidents from deployment failures—safety gates catch issues pre-prod

~90%

Reduction in false test results after PostSync hook migration (eliminated race conditions)

100%

Security scan coverage on all PRs—critical/high vulnerabilities blocked automatically

Transformation Metrics

Drag the slider to see the dramatic transformation from manual processes to automated platform engineering.

Deployment Frequency
2/week
Before
400/month
After
Drag the slider to compare before and after
Build Time
15+ min
Before
~5 min
After
Drag the slider to compare before and after
Service Onboarding
1 week
Before
<1 day
After
Drag the slider to compare before and after
Test Reliability
?%
Before
95%+
After
Drag the slider to compare before and after

Key Technical Innovations

1. Custom Pipeline Base Image

Built a reusable Docker image with pre-installed tooling (kubectl, ArgoCD CLI, AWS CLI, Maven, custom scripts) reducing pipeline configuration by ~40% and ensuring consistency.

FROM maven:3.9.6
RUN apt-get install -y curl git unzip jq && \
    curl -LO "https://.../kubectl" && chmod +x kubectl && \
    curl -sSL -o /usr/local/bin/argocd https://.../argocd-linux-amd64
COPY pipeline_reporter.sh jira_fix_version_check.sh /usr/local/bin/

2. Kustomize-Based Environment Management

Architected a base + overlays pattern for environment-specific configuration, eliminating duplication and reducing deployment errors.

Base: Shared Deployment, Service, ConfigMap templates

Overlays: Dev (CPU: 500m), QA (CPU: 1000m), Prod (CPU: 2000m, replicas: 3)

Script: `update_kustomization.sh` updates image tags via `kustomize edit set image`

3. ArgoCD App-of-Apps Pattern

Designed hierarchical ArgoCD application management where parent apps manage child apps, enabling PostSync hook testing per service.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myservice-dev
spec:
  sources:
  - repoURL: 'git@bitbucket.org:.../argocd-apps.git'
    path: 'applications/myservice/overlays/dev'
  - repoURL: 'git@bitbucket.org:.../argocd-apps.git'
    path: 'infra/test-suites/job-only'
    kustomize:
      patches:
      - target:
          kind: Job
        patch: |
          - op: add
            path: /metadata/annotations/argocd.argoproj.io~1hook
            value: PostSync

4. Comprehensive Test Orchestration

Built a 1100+ line Bash orchestrator (`run_all_tests.sh`) managing Newman API tests, Cucumber BDD tests, S3 uploads, and Teams notifications.

Deployment Health: kubectl checks for CrashLoopBackOff, ImagePullBackOff before tests

Test Execution: Runs all Postman collections + Cucumber features with timeouts

Result Processing: Generates HTML index pages, uploads to S3 with pre-signed URLs

Notifications: Sends rich Teams webhooks with deployment context and test results

Intelligent Notifications & Smart Routing

Pipeline reporter sends rich Microsoft Teams Adaptive Cards with smart routing across 4 channels—Platform Deployments, Security Alerts, QA notifications, and PR reviews.

Configuration Updated

GitOps state synced to cluster

Build:#3888
Environment:QA
Developer:Dohn Joe
🚀 ArgoCDView Pipeline
⚠️

Security Alert: Identity API

1 High vulnerability in dependencies

Build:#450
Environment:PULL REQUEST
Priority:⚡ High
📝 Create Jira🔍 Veracode

Smoke Tests Passed

All automated tests completed successfully

API Tests:✅ Passed
Cucumber:✅ Passed
Duration:7 minutes
📊 Test Results🚀 ArgoCD

Smart Features

  • Channel Routing: Platform, Security, QA, PR
  • Deep Links: ArgoCD, Veracode, Jira, S3
  • Auto Jira: Security alerts create stories
  • Easter Eggs: Build milestones, Friday warnings

Lessons Learned

  • Iterative improvement beats big-bang rewrites: Each phase built on the previous, allowing production usage throughout
  • Standardisation is key to scale: Reusable components (base images, scripts) made onboarding new services trivial
  • GitOps eliminates deployment drift: Declarative configs in Git provided audit trail and rollback capability
  • Testing architecture matters: Moving from pipeline polling to PostSync hooks eliminated an entire class of flaky tests
  • Observability from day one: DORA metrics, deployment notifications, and test reporting enabled data-driven improvements