Back to Projects

DORA Metrics & Developer Experience Platform

Building comprehensive business intelligence and developer experience tooling—automated metrics collection, intelligent notifications, and deployment gates

2024-2025
400 Deploys/Month Visibility

The Challenge

Engineering leadership had zero visibility into deployment velocity across 20 microservices and 4 environments. Critical business questions remained unanswered:

  • "How many deployments did we do this month?"
  • "What's our average lead time from commit to production?"
  • "Which Jira tickets are in the QA release right now?"
  • "Where are our deployment bottlenecks?"
  • "Why did that deployment fail?"

Additionally, developers had poor visibility into pipeline execution:

  • Manual checking of pipeline status across multiple services
  • No proactive notifications for deployment events
  • Silent security vulnerabilities discovered weeks later
  • Premature QA deployments missing required Jira metadata

Business Impact: Leadership couldn't demonstrate engineering velocity to stakeholders, and QA teams were constantly surprised by incomplete releases

The Solution: Multi-Layered Intelligence Platform

I designed and built a comprehensive DevEx platform consisting of three integrated components that transformed engineering operations:

1. DORA Metrics Collector

Python service correlating GitOps, Bitbucket, Jira, and ArgoCD data

2. Pipeline Reporter

Intelligent Teams notifications with rich context and smart routing

3. Deployment Gates

Automated quality checks preventing premature releases

DORA Metrics Collection Architecture

DORA Metrics CollectorPython Flask • K8s DeploymentCorrelates APIs • Exposes /metricsGitOps RepoArgoCD AppsDeployment configBitbucket APICommits • PipelinesLead time trackingJira APIIssues • Fix VersionsMTTR metricsArgoCD APISync status • HealthDeploy frequencyPrometheusScrapes /metrics15+ DORA metricsGrafanaDORA dashboardsLead time • Frequencygit cloneREST APIREST APIREST APIscrapequery

Flow: DORA Collector (K8s deployment) clones GitOps repo and correlates Bitbucket, Jira, and ArgoCD APIs → Exposes Prometheus metrics → Grafana visualizes deployment intelligence (deployment frequency, lead time, MTTR, change failure rate)

Component 1: DORA Metrics Collector

A Python service that acts as a centralised correlation engine between disconnected systems.

Technical Architecture

📋 Data Sources

  • ArgoCD Apps Repo - Desired deployment state (Kustomize manifests)
  • ArgoCD API - Actual deployment state, sync status, health
  • Bitbucket API - Commit metadata, author, timestamp
  • Jira API - Ticket enrichment (status, fix versions, sprint)

🔄 Processing Pipeline

1. Clone ArgoCD Apps Repo → Parse Kustomize overlays → Extract image tags per environment

2. Query Bitbucket API → Map image tags to commit SHAs → Extract Jira tickets from commit messages

3. Query Jira API → Enrich tickets with metadata (fix versions, status, assignee)

4. Calculate DORA Metrics → Deployment frequency, lead time, change failure rate

5. Expose Prometheus Metrics → 15+ custom metrics for dashboarding

📊 Key Metrics Exposed

deployment_desired_statedeployment_actual_state
deployment_lead_time_secondsdeployment_age_seconds
ticket_in_environmentticket_fix_version

Implementation Highlights

  • Retry Logic with Exponential Backoff: Handles API rate limits and transient failures gracefully
  • Git Repository Caching: Clones repos once, pulls updates to minimize Bitbucket load
  • Parallel Processing: Collects metrics for multiple services concurrently (20 services in ~30s)
  • Prometheus Integration: Flask HTTP server exposes /metrics endpoint for Prometheus scraping
  • Health Checks: Collection success/duration metrics for observability of the collector itself

Code Samples

pipeline_reporter.sh - Graceful Error Handling
#!/bin/bash

# Never fail the pipeline
set +e

# Enable debug if DEBUG=true
if [ "${DEBUG:-false}" = "true" ]; then
  set -x
  echo "🔍 DEBUG MODE ENABLED"
fi

# Check if webhooks are configured
if [ -z "$TEAMS_WEBHOOK_URL_DEFAULT" ]; then
  echo "❌ ERROR: TEAMS_WEBHOOK_DEFAULT not configured"
  echo "Pipeline notifications are disabled until webhooks are configured"
  echo "Continuing pipeline without notifications..."
  exit 0
fi
jira_fix_version_check.sh - Delta Detection Logic
# Get current QA deployment SHA from ArgoCD kustomization
get_current_qa_deployment() {
    # Clone or update argocd-apps repo (shallow clone for speed)
    if [[ ! -d "argocd-apps" ]]; then
        git clone --depth 1 \
          "https://x-token-auth:${ARGOCD_APPS_ACCESS_TOKEN}@bitbucket.org/org/argocd-apps.git" \
          argocd-apps 2>/dev/null || {
            log_error "Failed to clone argocd-apps repository"
            return 1
        }
    fi

    # Find the kustomization file for this repo's QA environment
    local kustomization_file="argocd-apps/applications/${BITBUCKET_REPO_SLUG}/overlays/qa/kustomization.yaml"

    # Extract the current image tag from kustomization
    local current_tag=$(grep -A5 "name: ${BITBUCKET_REPO_SLUG}" "$kustomization_file" | \
                        grep 'newTag:' | \
                        sed -E 's/.*newTag: *"?([^"]*)"?/\1/')

    echo "$current_tag"
}

Component 2: Intelligent Pipeline Reporter

A 1000+ line Bash script that transforms pipeline events into rich, actionable Teams notifications with smart routing and context.

Core Features

🎨 Rich Adaptive Cards

  • Deployment Notifications: Service, environment, commit, developer, timestamp
  • Security Alerts: Vulnerability counts, severity, Veracode/Jira links
  • Test Results: Pass/fail status, S3 results links, PostSync job integration
  • Feature Branches: PR links, namespace details, kubectl commands

🎯 Smart Notification Routing

Platform Deployments Channel: Dev, QA, PreProd, Prod deployments

Security Channel: SAST/SCA alerts from Veracode/SourceClear

PR Notifications Channel: Feature branch deployments with PR context

QA Team Channel: QA deployment events for testing coordination

✨ Easter Eggs & DevEx Enhancements

  • Special Build Messages: Build #42, #404, #1337, milestone builds (#1000)
  • Time-Based Messages: "May the Fourth be with this code" (May 4th)
  • Friday Evening Prod Deploys: "Bold." acknowledgment
  • Production Deploy Recognition: Personalized messages per developer

🔗 Actionable Links

Deployments: ArgoCD application URL, Bitbucket pipeline, Test results (S3)

Security Alerts: Jira story creation (pre-filled), Veracode dashboard, Pipeline logs

Feature Branches: PR review link, Pipeline status, kubectl port-forward commands

Example Notification

✅ Configuration Updated: Service A - DEV [Master Branch]
Build: #3881
Environment: DEV
Commit: b7ffa920
👤 Developer Name
PROJ-1234 add dev deployment summary, change to short commit hash
ℹ️ Additional Information
Tests will run automatically after sync completes

Component 3: Automated Deployment Gates

Intelligent checks that prevent premature deployments and enforce quality standards.

Jira Fix Version Check (QA Gate)

A Bash script that enforces Jira Fix Version assignment before QA deployment, preventing incomplete releases from reaching QA.

How It Works

1. Determine Current QA State: Queries ArgoCD apps repo to find currently deployed commit SHA in QA

2. Identify New Commits: Uses git rev-list to find commits between QA and HEAD

3. Extract Jira Tickets: Regex matching on commit messages to find all JIRA-XXXX patterns

4. Validate Fix Versions: Queries Jira API to check each ticket has Fix Version assigned

5. Block or Allow: Fails pipeline if any ticket missing Fix Version, provides actionable error message

Example Output
════════════════════════════════════════════════════
❌ QA DEPLOYMENT BLOCKED
════════════════════════════════════════════════════

The following 2 ticket(s) are missing Fix Version/s:

  🎫 PROJ-1001
     https://company.atlassian.net/browse/PROJ-1001

  🎫 PROJ-1002
     https://company.atlassian.net/browse/PROJ-1002

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 ACTION REQUIRED:
1. Open each ticket above in Jira
2. Set the 'Fix Version/s' field
3. Re-run this QA deployment pipeline

Key Features

  • Delta Detection: Only checks NEW commits, not entire QA history
  • Error Handling: Differentiates API errors from validation failures
  • Bypass Mechanism: Emergency SKIP_JIRA_FIX_VERSION_CHECK=true flag
  • Debug Mode: Verbose logging for troubleshooting integration issues

Business Impact

413

Deployments tracked per month across all environments (verified via dashboard)

2-3 days

Average lead time from Dev → QA → PreProd measured for first time

100%

Deployment visibility—leadership can now answer "what's deployed where" instantly

4 Channels

Smart notification routing ensuring right people get right information

Zero

Incomplete QA releases since Jira Fix Version gate implementation

~80%

Reduction in "what version is in QA?" questions to platform team

Technical Highlights

  • Built production-grade Python service with retry logic, caching, and parallel processing
  • Integrated 4 separate APIs (Bitbucket, Jira, ArgoCD, Prometheus) into unified intelligence layer
  • Designed 15+ custom Prometheus metrics enabling comprehensive DORA dashboards
  • Implemented sophisticated Bash scripting (1000+ lines) for rich Teams notifications
  • Created automated deployment gates with Jira API integration and delta detection
  • Deployed as Kubernetes service with Prometheus scraping and Grafana visualisation
  • Improved developer experience with actionable notifications and early validation