DORA Metrics & Developer Experience Platform
Building comprehensive business intelligence and developer experience tooling—automated metrics collection, intelligent notifications, and deployment gates
The Challenge
Engineering leadership had zero visibility into deployment velocity across 20 microservices and 4 environments. Critical business questions remained unanswered:
- "How many deployments did we do this month?"
- "What's our average lead time from commit to production?"
- "Which Jira tickets are in the QA release right now?"
- "Where are our deployment bottlenecks?"
- "Why did that deployment fail?"
Additionally, developers had poor visibility into pipeline execution:
- Manual checking of pipeline status across multiple services
- No proactive notifications for deployment events
- Silent security vulnerabilities discovered weeks later
- Premature QA deployments missing required Jira metadata
Business Impact: Leadership couldn't demonstrate engineering velocity to stakeholders, and QA teams were constantly surprised by incomplete releases
The Solution: Multi-Layered Intelligence Platform
I designed and built a comprehensive DevEx platform consisting of three integrated components that transformed engineering operations:
1. DORA Metrics Collector
Python service correlating GitOps, Bitbucket, Jira, and ArgoCD data
2. Pipeline Reporter
Intelligent Teams notifications with rich context and smart routing
3. Deployment Gates
Automated quality checks preventing premature releases
DORA Metrics Collection Architecture
Flow: DORA Collector (K8s deployment) clones GitOps repo and correlates Bitbucket, Jira, and ArgoCD APIs → Exposes Prometheus metrics → Grafana visualizes deployment intelligence (deployment frequency, lead time, MTTR, change failure rate)
Component 1: DORA Metrics Collector
A Python service that acts as a centralised correlation engine between disconnected systems.
Technical Architecture
📋 Data Sources
- • ArgoCD Apps Repo - Desired deployment state (Kustomize manifests)
- • ArgoCD API - Actual deployment state, sync status, health
- • Bitbucket API - Commit metadata, author, timestamp
- • Jira API - Ticket enrichment (status, fix versions, sprint)
🔄 Processing Pipeline
1. Clone ArgoCD Apps Repo → Parse Kustomize overlays → Extract image tags per environment
2. Query Bitbucket API → Map image tags to commit SHAs → Extract Jira tickets from commit messages
3. Query Jira API → Enrich tickets with metadata (fix versions, status, assignee)
4. Calculate DORA Metrics → Deployment frequency, lead time, change failure rate
5. Expose Prometheus Metrics → 15+ custom metrics for dashboarding
📊 Key Metrics Exposed
deployment_desired_statedeployment_actual_statedeployment_lead_time_secondsdeployment_age_secondsticket_in_environmentticket_fix_versionImplementation Highlights
- →Retry Logic with Exponential Backoff: Handles API rate limits and transient failures gracefully
- →Git Repository Caching: Clones repos once, pulls updates to minimize Bitbucket load
- →Parallel Processing: Collects metrics for multiple services concurrently (20 services in ~30s)
- →Prometheus Integration: Flask HTTP server exposes /metrics endpoint for Prometheus scraping
- →Health Checks: Collection success/duration metrics for observability of the collector itself
Code Samples
#!/bin/bash
# Never fail the pipeline
set +e
# Enable debug if DEBUG=true
if [ "${DEBUG:-false}" = "true" ]; then
set -x
echo "🔍 DEBUG MODE ENABLED"
fi
# Check if webhooks are configured
if [ -z "$TEAMS_WEBHOOK_URL_DEFAULT" ]; then
echo "❌ ERROR: TEAMS_WEBHOOK_DEFAULT not configured"
echo "Pipeline notifications are disabled until webhooks are configured"
echo "Continuing pipeline without notifications..."
exit 0
fi# Get current QA deployment SHA from ArgoCD kustomization
get_current_qa_deployment() {
# Clone or update argocd-apps repo (shallow clone for speed)
if [[ ! -d "argocd-apps" ]]; then
git clone --depth 1 \
"https://x-token-auth:${ARGOCD_APPS_ACCESS_TOKEN}@bitbucket.org/org/argocd-apps.git" \
argocd-apps 2>/dev/null || {
log_error "Failed to clone argocd-apps repository"
return 1
}
fi
# Find the kustomization file for this repo's QA environment
local kustomization_file="argocd-apps/applications/${BITBUCKET_REPO_SLUG}/overlays/qa/kustomization.yaml"
# Extract the current image tag from kustomization
local current_tag=$(grep -A5 "name: ${BITBUCKET_REPO_SLUG}" "$kustomization_file" | \
grep 'newTag:' | \
sed -E 's/.*newTag: *"?([^"]*)"?/\1/')
echo "$current_tag"
}Component 2: Intelligent Pipeline Reporter
A 1000+ line Bash script that transforms pipeline events into rich, actionable Teams notifications with smart routing and context.
Core Features
🎨 Rich Adaptive Cards
- • Deployment Notifications: Service, environment, commit, developer, timestamp
- • Security Alerts: Vulnerability counts, severity, Veracode/Jira links
- • Test Results: Pass/fail status, S3 results links, PostSync job integration
- • Feature Branches: PR links, namespace details, kubectl commands
🎯 Smart Notification Routing
Platform Deployments Channel: Dev, QA, PreProd, Prod deployments
Security Channel: SAST/SCA alerts from Veracode/SourceClear
PR Notifications Channel: Feature branch deployments with PR context
QA Team Channel: QA deployment events for testing coordination
✨ Easter Eggs & DevEx Enhancements
- • Special Build Messages: Build #42, #404, #1337, milestone builds (#1000)
- • Time-Based Messages: "May the Fourth be with this code" (May 4th)
- • Friday Evening Prod Deploys: "Bold." acknowledgment
- • Production Deploy Recognition: Personalized messages per developer
🔗 Actionable Links
Deployments: ArgoCD application URL, Bitbucket pipeline, Test results (S3)
Security Alerts: Jira story creation (pre-filled), Veracode dashboard, Pipeline logs
Feature Branches: PR review link, Pipeline status, kubectl port-forward commands
Example Notification
Component 3: Automated Deployment Gates
Intelligent checks that prevent premature deployments and enforce quality standards.
Jira Fix Version Check (QA Gate)
A Bash script that enforces Jira Fix Version assignment before QA deployment, preventing incomplete releases from reaching QA.
How It Works
1. Determine Current QA State: Queries ArgoCD apps repo to find currently deployed commit SHA in QA
2. Identify New Commits: Uses git rev-list to find commits between QA and HEAD
3. Extract Jira Tickets: Regex matching on commit messages to find all JIRA-XXXX patterns
4. Validate Fix Versions: Queries Jira API to check each ticket has Fix Version assigned
5. Block or Allow: Fails pipeline if any ticket missing Fix Version, provides actionable error message
════════════════════════════════════════════════════
❌ QA DEPLOYMENT BLOCKED
════════════════════════════════════════════════════
The following 2 ticket(s) are missing Fix Version/s:
🎫 PROJ-1001
https://company.atlassian.net/browse/PROJ-1001
🎫 PROJ-1002
https://company.atlassian.net/browse/PROJ-1002
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 ACTION REQUIRED:
1. Open each ticket above in Jira
2. Set the 'Fix Version/s' field
3. Re-run this QA deployment pipelineKey Features
- • Delta Detection: Only checks NEW commits, not entire QA history
- • Error Handling: Differentiates API errors from validation failures
- • Bypass Mechanism: Emergency
SKIP_JIRA_FIX_VERSION_CHECK=trueflag - • Debug Mode: Verbose logging for troubleshooting integration issues
Business Impact
Deployments tracked per month across all environments (verified via dashboard)
Average lead time from Dev → QA → PreProd measured for first time
Deployment visibility—leadership can now answer "what's deployed where" instantly
Smart notification routing ensuring right people get right information
Incomplete QA releases since Jira Fix Version gate implementation
Reduction in "what version is in QA?" questions to platform team
Technical Highlights
- ✓Built production-grade Python service with retry logic, caching, and parallel processing
- ✓Integrated 4 separate APIs (Bitbucket, Jira, ArgoCD, Prometheus) into unified intelligence layer
- ✓Designed 15+ custom Prometheus metrics enabling comprehensive DORA dashboards
- ✓Implemented sophisticated Bash scripting (1000+ lines) for rich Teams notifications
- ✓Created automated deployment gates with Jira API integration and delta detection
- ✓Deployed as Kubernetes service with Prometheus scraping and Grafana visualisation
- ✓Improved developer experience with actionable notifications and early validation