CI/CD & GitOps Platform Engineering
Building production-grade CI/CD infrastructure from scratch—evolving a greenfield microservices platform to 400 deploys/month with enterprise security
The Challenge
I joined as a Graduate QA Engineer at the very beginning of a massive architectural transformation: migrating from a legacy monolith to a modern Kubernetes-based microservices platform.
With no existing CI/CD infrastructure and a blank canvas, I was tasked with designing and implementing the entire deployment pipeline that would serve 20+ microservices across 4 environments (Dev, QA, Pre-Prod, Production).
Initial Requirements
- • Zero existing infrastructure — build everything from scratch
- • Multi-environment strategy — Dev, QA, Pre-Prod, Prod with different promotion rules
- • Security-first approach — SAST/SCA scanning, compliance gates
- • Developer productivity — abstract complexity, create "paved path"
- • GitOps-native — declarative deployments with ArgoCD
The Evolution
Over 2 years of iterative platform engineering, I transformed a basic build script into a sophisticated, production-ready CI/CD platform. Here's how it evolved:
Phase 1: Foundation (Months 1-6)
Built the initial pipeline architecture with basic Maven builds, Docker image creation, and manual ECR pushes.
# Early Pipeline (bitbucket-pipelines.yml)
image: maven:3.9.6
pipelines:
branches:
master:
- step:
name: Build, Test, and Push Docker Image
script:
- mvn clean install
- mvn test
- docker build -t $IMAGE_NAME:$IMAGE_TAG .
- pipe: atlassian/aws-ecr-push-image:2.2.0
- step:
name: Deploy to Dev
script:
- git clone git@bitbucket.org:company/argocd-apps.git
- ./scripts/update_tag.sh # Update deployment manifest
- git push origin masterKey Achievement: Established GitOps pattern with ArgoCD for declarative deployments
Phase 2: Standardisation & Reusability (Months 6-12)
Created reusable pipeline components and custom base images to eliminate duplication across services.
- • Custom Base Image — Pre-built Maven image with kubectl, ArgoCD CLI, AWS CLI, jq
- • Shared Pipeline Scripts — `pipeline_reporter.sh` for centralised notifications
- • Kustomize Overlays — Environment-specific configuration management
- • App-of-Apps Pattern — Hierarchical ArgoCD application management
Innovation: Built custom Docker base image reducing pipeline config by ~40% and ensuring consistency
Phase 3: Security & Quality Gates (Months 12-18)
Integrated enterprise security scanning and quality gates directly into the pipeline.
SAST (Veracode)
Static code analysis blocking critical/high vulnerabilities
SCA (SourceClear)
Dependency vulnerability scanning with automated reporting
# Security Scans in Parallel (Pull Requests)
pipelines:
pull-requests:
'**':
- step: *unit-tests
- parallel:
- step: *integration-tests
- step: *security-scan # SCA
- step: *static-analysis # SAST
- step: *veracode-ui-uploadPhase 4: Test Orchestration & Observability (Months 18-24)
Migrated from brittle pipeline-based testing to robust ArgoCD PostSync hooks with comprehensive observability.
- • PostSync Hook Architecture — Tests run AFTER deployment completes (eliminated race conditions)
- • Automated Test Suites — Newman (API), Cucumber (BDD) packaged as Kubernetes Jobs
- • S3 Integration — Test results uploaded for historical analysis and compliance
- • Teams Notifications — Rich webhook messages with test status, deployment info, and ArgoCD links
- • DORA Metrics — Built custom collector tracking deployment frequency, lead time, MTTR
Business Impact: Reduced false test results by ~90% and provided deployment visibility to entire org
CI/CD & GitOps Architecture
Flow: Developer commits → Bitbucket runs parallel tests → Docker builds with custom base image → Pushes to ECR → Updates GitOps repo → ArgoCD syncs → Deploys to K8s → PostSync tests run → Teams notifications at each stage
Current State: Production-Grade Platform
Today's pipeline is a fully standardised, production-ready CI/CD platform supporting 20 microservices with sophisticated testing, security, and deployment automation.
Pipeline Architecture
# Modern Pipeline (Standardized for 20 Services)
image: 123456789012.dkr.ecr.region.amazonaws.com/company/pipeline-base-image:v1.2
definitions:
steps:
unit-tests: &unit-tests
name: "Unit Tests"
script:
- mvn -B clean install -Pfast-tests
- pipeline_reporter.sh test_failure ci "Unit Tests"
integration-tests: &integration-tests
name: "Integration Tests"
size: 2x
script:
- mvn -B clean verify -Pintegration-tests
- pipeline_reporter.sh test_failure ci "Integration Tests"
security-scan: &security-scan
name: "Software Composition Analysis"
script:
- curl -sSL https://download.sourceclear.com/ci.sh | sh
- pipeline_reporter.sh security_alert pr "Critical vulnerabilities"
static-analysis: &static-analysis
name: "Static Analysis"
script:
- java -jar pipeline-scan.jar -vid "$VERACODE_API_ID"
- pipeline_reporter.sh security_alert pr "Code vulnerabilities"
pipelines:
pull-requests:
'**':
- step: *unit-tests
- parallel:
- step: *integration-tests
- step: *security-scan
- step: *static-analysis
branches:
master:
- step: *unit-tests
- step:
name: "Docker Build & Push"
script:
- docker build -t $IMAGE_NAME:$IMAGE_TAG .
- pipe: atlassian/aws-ecr-push-image:2.4.2
- step:
name: "Deploy to Dev"
script:
- git clone https://x-token-auth:$TOKEN@bitbucket.org/company/argocd-apps.git
- ./scripts/update_kustomization.sh dev "$IMAGE_TAG"
- pipeline_reporter.sh deploy_success dev
- step:
name: "Jira Fix Version Check - QA Gate"
script:
- jira_fix_version_check.sh # Validates Jira ticket status
- step:
name: "Deploy to QA"
script:
- ./scripts/update_kustomization.sh qa "$IMAGE_TAG"
- pipeline_reporter.sh deploy_success qa
custom:
preprod-deploy:
- variables:
- name: IMAGE_TAG
- step:
name: "Deploy to PreProd"
script:
- ./scripts/update_kustomization.sh preprod "$IMAGE_TAG"
prod-deploy:
- variables:
- name: IMAGE_TAG
- step:
name: "Deploy to Production"
script:
- ./scripts/update_kustomization.sh prod "$IMAGE_TAG"Business Impact
Deployments per month across all environments (verified via custom DORA metrics collector)
Consistent build time with parallel test execution—optimised through iterative improvements
Microservices using standardised pipeline—"paved path" reduces onboarding to <1 day
Production incidents from deployment failures—safety gates catch issues pre-prod
Reduction in false test results after PostSync hook migration (eliminated race conditions)
Security scan coverage on all PRs—critical/high vulnerabilities blocked automatically
Transformation Metrics
Drag the slider to see the dramatic transformation from manual processes to automated platform engineering.
Key Technical Innovations
1. Custom Pipeline Base Image
Built a reusable Docker image with pre-installed tooling (kubectl, ArgoCD CLI, AWS CLI, Maven, custom scripts) reducing pipeline configuration by ~40% and ensuring consistency.
FROM maven:3.9.6
RUN apt-get install -y curl git unzip jq && \
curl -LO "https://.../kubectl" && chmod +x kubectl && \
curl -sSL -o /usr/local/bin/argocd https://.../argocd-linux-amd64
COPY pipeline_reporter.sh jira_fix_version_check.sh /usr/local/bin/2. Kustomize-Based Environment Management
Architected a base + overlays pattern for environment-specific configuration, eliminating duplication and reducing deployment errors.
• Base: Shared Deployment, Service, ConfigMap templates
• Overlays: Dev (CPU: 500m), QA (CPU: 1000m), Prod (CPU: 2000m, replicas: 3)
• Script: `update_kustomization.sh` updates image tags via `kustomize edit set image`
3. ArgoCD App-of-Apps Pattern
Designed hierarchical ArgoCD application management where parent apps manage child apps, enabling PostSync hook testing per service.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myservice-dev
spec:
sources:
- repoURL: 'git@bitbucket.org:.../argocd-apps.git'
path: 'applications/myservice/overlays/dev'
- repoURL: 'git@bitbucket.org:.../argocd-apps.git'
path: 'infra/test-suites/job-only'
kustomize:
patches:
- target:
kind: Job
patch: |
- op: add
path: /metadata/annotations/argocd.argoproj.io~1hook
value: PostSync4. Comprehensive Test Orchestration
Built a 1100+ line Bash orchestrator (`run_all_tests.sh`) managing Newman API tests, Cucumber BDD tests, S3 uploads, and Teams notifications.
• Deployment Health: kubectl checks for CrashLoopBackOff, ImagePullBackOff before tests
• Test Execution: Runs all Postman collections + Cucumber features with timeouts
• Result Processing: Generates HTML index pages, uploads to S3 with pre-signed URLs
• Notifications: Sends rich Teams webhooks with deployment context and test results
Intelligent Notifications & Smart Routing
Pipeline reporter sends rich Microsoft Teams Adaptive Cards with smart routing across 4 channels—Platform Deployments, Security Alerts, QA notifications, and PR reviews.
Configuration Updated
GitOps state synced to cluster
Security Alert: Identity API
1 High vulnerability in dependencies
Smoke Tests Passed
All automated tests completed successfully
Smart Features
- • Channel Routing: Platform, Security, QA, PR
- • Deep Links: ArgoCD, Veracode, Jira, S3
- • Auto Jira: Security alerts create stories
- • Easter Eggs: Build milestones, Friday warnings
Lessons Learned
- →Iterative improvement beats big-bang rewrites: Each phase built on the previous, allowing production usage throughout
- →Standardisation is key to scale: Reusable components (base images, scripts) made onboarding new services trivial
- →GitOps eliminates deployment drift: Declarative configs in Git provided audit trail and rollback capability
- →Testing architecture matters: Moving from pipeline polling to PostSync hooks eliminated an entire class of flaky tests
- →Observability from day one: DORA metrics, deployment notifications, and test reporting enabled data-driven improvements