🤖 Live Demo Three Rivers Bank · Pittsburgh, PA

Agentic
DevOps

A full-stack Credit Cards demo app built, deployed, and operated entirely by AI agents — from first line of code to autonomous incident response.

The software industry has embraced AI-assisted development — but we often forget the Ops side. This repo demonstrates Agentic workflows across the full SDLC: from writing code with GitHub Copilot, to deploying with Copilot CLI, to autonomous incident response with Azure SRE Agent.

With Azure SRE Agent, you can fully automate incident detection, investigation, and remediation. By the time you page your on-call SRE, the agent will already have a root cause analysis and a Copilot Coding Agent will have implemented a fix — ready for human review and merge.

Built entirely
with GitHub Copilot

Three Rivers Bank was conceived, designed, coded, tested, and deployed without a single line written by hand. Three complementary Copilot tools handled every phase of development.

IDE Assistant

The foundation. Copilot's inline completions and chat suggestions powered every file — from Spring Boot entities and React components to Terraform modules and GitHub Actions workflows.

  • Scaffolded all 5 business credit card JPA entities with relationships
  • Generated the Resilience4j circuit breaker configuration
  • Wrote all 10 JUnit tests (100% pass rate)
  • Produced the Material-UI Three Rivers Bank theme (Navy #003366 / Teal #008080)
  • Created all 5 Playwright E2E test specs across 3 viewport sizes
🖥️
Terminal Intelligence

The operational backbone. Used throughout the entire project lifecycle to drive infrastructure, troubleshoot deployments, and orchestrate the Azure SRE Agent setup — all from the terminal.

🤖
Autonomous Developer

The closer. When Azure SRE Agent detects an incident and files a GitHub Issue with full root cause analysis, Copilot Coding Agent picks it up autonomously — reads the issue, authors the fix, and opens a PR.

  • Reads SRE Agent issues assigned to @copilot
  • Performs independent code analysis to validate the root cause
  • Implements targeted, minimal fixes without human guidance
  • Creates a well-described PR referencing the incident issue
  • Human review + merge = app restored ✅

Three Rivers Bank
Credit Card Portal

A real-world full-stack Credit Cards demo app with Spring Boot backend, React frontend, and Azure Container Apps hosting — complete with CI/CD, health checks, and BIAN API v13 integration.

User
🌐 Browser
Frontend
⚛️ React 18 + Vite
🗺️ React Router v6
🔄 TanStack Query
🎨 Material-UI
Backend
☕ Spring Boot 3.2
🗄️ H2 In-Memory DB
🔌 Feign BIAN Client
⚡ Resilience4j
Infra
📦 Azure Container Apps
🐳 Azure Container Registry
📊 Log Analytics
CI/CD
🔧 GitHub Actions CI
🚀 GitHub Actions CD
🏗️ Terraform + azd
Observability
🤖 Azure SRE Agent
🔔 Azure Monitor Alerts
📈 KQL Log Queries

Azure SRE Agent
Never Sleeps

Azure SRE Agent is an AI agent that monitors your application 24/7, correlates logs and metrics with source code, performs root cause analysis, and proactively routes fixes to the right agent — all without human intervention.

👁️
Proactive Health Monitoring
Three scheduled tasks run continuously: health checks every 30 minutes, configuration drift detection every 6 hours, and a daily reliability report at 8am UTC. No alert required — it checks proactively.
🚨
Intelligent Alert Routing
Three Azure Monitor alert rules fire into the SRE Agent: HTTP 5xx spike (severity 2), container restarts / OOM kills (severity 1), and high response time (severity 3). Each routes to the incident-handler subagent.
🔍
Code-Aware Root Cause Analysis
Connected to the GitHub repository via OAuth, the code-analyzer subagent queries KQL logs, reads container metrics, then cross-references the source code to pinpoint the exact file and line that caused the incident.
📋
Automated Issue Creation
After RCA is complete, the incident-handler subagent creates a structured GitHub Issue with incident summary, affected metrics, error traces, root cause, and a fix recommendation — then assigns it directly to @copilot.
⚙️
Subagent Architecture
Two specialized subagents with distinct responsibilities: incident-handler (triage, issue creation, escalation) and code-analyzer (deep code inspection, patch suggestions, PR review).
📚
Knowledge-Grounded Decisions
Two knowledge base documents ground every investigation: the HTTP error runbook (incident response procedures) and the app architecture reference (service topology, dependencies, expected behavior).
sre.azure.com — incident-handler log
08:31:02 [INFO] health-check task started
08:31:04 [INFO] querying backend: GET /actuator/health
08:31:09 [WARN] response time: 7,842ms (threshold: 3,000ms)
08:31:12 [WARN] error rate: 34% (last 10min)
08:31:13 [ALERT] escalating → code-analyzer subagent
08:31:16 [INFO] KQL: querying ContainerAppConsoleLogs...
08:31:21 [INFO] found: Thread.sleep in getAllCreditCards()
08:31:22 [INFO] reading: CreditCardService.java (GitHub)
08:31:28 [ALERT] ROOT CAUSE: artificial sleep(0-9000ms) line 47
08:31:30 [INFO] creating GitHub issue #42...
08:31:31 [FIX] issue assigned to @copilot ✓
08:31:32 [FIX] label: sre-agent-detected applied ✓
08:31:33 [INFO] awaiting Copilot fix PR...

Scheduled Proactive Tasks

Running continuously — no alert required to investigate.

Task Schedule What it checks On issue found Status
three-rivers-health-check Every 30 min Backend/frontend health, error rates, response time, container restarts Escalates to code-analyzer + incident-handler Active
three-rivers-config-drift Every 6 hours Env vars, container resource limits, image versions vs. expected Creates GitHub issue via incident-handler Active
daily-reliability-report Daily 8am UTC 24h metrics summary, 7-day degradation trends, PR correlation Reliability recommendations posted to GitHub Active

From Defect to Fix —
(Almost) Zero Human Intervention

Watch how a breaking change introduced in production triggers a fully automated pipeline: chaos → detection → RCA → fix → recovery. Humans only review the PR.

🔴 Breaking Change Reaches the Demo App
A GitHub Agentic Workflow (chaos-engineering.lock.yml) autonomously selects a chaos scenario, modifies the target file to introduce a realistic bug, and opens a PR with a plausible commit message. Once merged, CI/CD deploys the broken code to Azure Container Apps.
1
🟡 CI/CD Deploys the Defect
The GitHub Actions CD workflow triggers on merge to main. azd up runs Terraform to apply infrastructure changes and deploy updated container images to Azure Container Apps. The defect is now live. Load continues arriving — errors begin accumulating.
⚙️ GitHub Actions CD
2
Step 2 screenshot
🚨 Azure Monitor Alert Fires
Within minutes, Azure Monitor detects the anomaly and fires an alert. For a slow-response scenario, the High Response Time alert triggers (severity 3). For an OOM scenario, the Container Restart alert fires (severity 1). The alert routes directly to the SRE Agent's response plan.
🤖 Azure Monitor → SRE Agent
3
Step 3 screenshot
🔍 SRE Agent Investigates
The incident-handler subagent wakes up and delegates deep analysis to code-analyzer. It queries ContainerAppConsoleLogs via KQL, reads container metrics, then cross-references the GitHub repository to find the exact commit and code change that caused the degradation. Average time to root cause: < 2 minutes.
🤖 SRE Agent — code-analyzer subagent
4
Step 4 screenshot
📋 GitHub Issue Created with Full RCA
The agent creates a structured GitHub Issue containing: incident timeline, affected metrics (response times, error rates), error log excerpts, root cause (exact file + line), and a concrete fix recommendation. The issue is labeled sre-agent-detected and assigned to @copilot.
🤖 SRE Agent — incident-handler subagent
5
Step 5 screenshot
🤖 Copilot Coding Agent Creates Fix PR
Copilot Coding Agent picks up the assigned issue, reads the RCA, performs its own code inspection, and implements the minimal fix. It opens a PR referencing the incident issue — ready for human review. No one had to write the fix; Copilot authored it autonomously based on the SRE report.
✨ GitHub Copilot Coding Agent
6
Step 6 screenshot
👁️ Human Reviews & Approves
An engineer reviews the Copilot-authored PR. The PR links back to the SRE Agent incident issue, making context immediately clear. CI runs, tests pass, and the engineer merges. This is the only human step in the entire incident response pipeline.
👤 Human Review
7
Step 7 screenshot
✅ Production Restored
CD deploys the fix. The SRE Agent's next health check confirms all metrics are normal: response times drop, error rates return to zero, container restarts stop. The agent closes the incident and logs a recovery summary. The load generator's output returns to all-green. 🟢
⚙️ GitHub Actions CD → SRE Agent confirms
8
Step 8 screenshot

The Chaos Engine —
AI That Breaks Things

GitHub Agentic Workflows (gh-aw) compile natural-language .md prompt files into locked workflows. The chaos engineering workflow uses this to autonomously inject app-breaking changes via PR — simulating how real defects slip through to an environment.

How it works

📝
Write Prompt
⚙️
Compile
gh aw compile
🔒
Lock File
.lock.yml (signed)
▶️
Trigger
gh workflow run
🤖
Agent Executes
Sandboxed + firewalled
🔀
Creates PR
chaos/* branch
---
description: >
  Chaos engineering workflow — introduces breaking changes
  for Azure SRE Agent demo. Creates a PR with a realistic fault.
on:
  workflow_dispatch: {}
tools:
  github:
    toolsets: [default]
safe-outputs:
  create-pull-request:
    max: 1
---
 
# Chaos Engineering — SRE Agent Demo
 
# 1. Search for open issues with chaos-engineering label
# 2. Extract scenario from issue title
# 3. Apply the breaking change to the target file
# 4. Create a PR on chaos/{scenario}-{date} branch
# with chaos-engineering label
🛡️ Security Model: Each agentic workflow run executes inside a sandboxed container with an egress firewall. Only api.github.com and api.githubcopilot.com are permitted. All other outbound traffic is blocked. Safe outputs (like create-pull-request) are gated with a maximum of 1 PR per run.

Run the Demo
in 3 Steps

terminal
# 1. Deploy the app to Azure
az login && azd auth login
azd up
 
# 2. Deploy the SRE Agent
cd sre && azd up
bash scripts/post-provision.sh
 
# 3. Trigger chaos and watch the loop
gh issue create --title "chaos: backend-slow-response" \
  --label "chaos-engineering"
gh workflow run chaos-engineering.lock.yml