🤖 Live Demo Three Rivers Bank · Pittsburgh, PA

Agentic
DevOps

A full-stack Credit Cards demo app built, deployed, and operated entirely by AI agents — from first line of code to autonomous incident response.

🔄 See the Agentic Loop 🎬 Watch the Walkthrough ⭐ GitHub Repository

The software industry has embraced AI-assisted development — but we often forget the Ops side. This repo demonstrates Agentic workflows across the full SDLC: from writing code with GitHub Copilot, to deploying with Copilot CLI, to autonomous incident response with Azure SRE Agent.

With Azure SRE Agent, you can fully automate incident detection, investigation, and remediation. By the time you page your on-call SRE, the agent will already have a root cause analysis and a Copilot Coding Agent will have implemented a fix — ready for human review and merge.

Development

Built entirely
with GitHub Copilot

Three Rivers Bank was conceived, designed, coded, tested, and deployed without a single line written by hand. Three complementary Copilot tools handled every phase of development.

✨

GitHub Copilot

IDE Assistant

The foundation. Copilot's inline completions and chat suggestions powered every file — from Spring Boot entities and React components to Terraform modules and GitHub Actions workflows.

Scaffolded all 5 business credit card JPA entities with relationships
Generated the Resilience4j circuit breaker configuration
Wrote all 10 JUnit tests (100% pass rate)
Produced the Material-UI Three Rivers Bank theme (Navy #003366 / Teal #008080)
Created all 5 Playwright E2E test specs across 3 viewport sizes

🖥️

GitHub Copilot CLI

Terminal Intelligence

The operational backbone. Used throughout the entire project lifecycle to drive infrastructure, troubleshoot deployments, and orchestrate the Azure SRE Agent setup — all from the terminal.

Designed Terraform modules for Azure Container Apps, ACR, and Log Analytics
Authored the azd project configuration and preprovision hooks
Built the SRE Agent Bicep infrastructure (Microsoft.App/agents)
Designed the chaos engineering agentic workflow — 13 scenarios that inject realistic breaking changes via PR
Created this GitHub Pages site

🤖

Copilot Coding Agent

Autonomous Developer

The closer. When Azure SRE Agent detects an incident and files a GitHub Issue with full root cause analysis, Copilot Coding Agent picks it up autonomously — reads the issue, authors the fix, and opens a PR.

Reads SRE Agent issues assigned to @copilot
Performs independent code analysis to validate the root cause
Implements targeted, minimal fixes without human guidance
Creates a well-described PR referencing the incident issue
Human review + merge = app restored ✅

The Application

Three Rivers Bank
Credit Card Portal

A real-world full-stack Credit Cards demo app with Spring Boot backend, React frontend, and Azure Container Apps hosting — complete with CI/CD, health checks, and BIAN API v13 integration.

User

🌐 Browser

↓

Frontend

⚛️ React 18 + Vite

🗺️ React Router v6

🔄 TanStack Query

🎨 Material-UI

Backend

☕ Spring Boot 3.2

🗄️ H2 In-Memory DB

🔌 Feign BIAN Client

⚡ Resilience4j

Infra

📦 Azure Container Apps

🐳 Azure Container Registry

📊 Log Analytics

CI/CD

🔧 GitHub Actions CI

🚀 GitHub Actions CD

🏗️ Terraform + azd

Observability

🤖 Azure SRE Agent

🔔 Azure Monitor Alerts

📈 KQL Log Queries

Operations

Azure SRE Agent
Never Sleeps

Azure SRE Agent is an AI agent that monitors your application 24/7, correlates logs and metrics with source code, performs root cause analysis, and proactively routes fixes to the right agent — all without human intervention.

👁️

Proactive Health Monitoring

Three scheduled tasks run continuously: health checks every 30 minutes, configuration drift detection every 6 hours, and a daily reliability report at 8am UTC. No alert required — it checks proactively.

🚨

Intelligent Alert Routing

Three Azure Monitor alert rules fire into the SRE Agent: HTTP 5xx spike (severity 2), container restarts / OOM kills (severity 1), and high response time (severity 3). Each routes to the incident-handler subagent.

🔍

Code-Aware Root Cause Analysis

Connected to the GitHub repository via OAuth, the code-analyzer subagent queries KQL logs, reads container metrics, then cross-references the source code to pinpoint the exact file and line that caused the incident.

📋

Automated Issue Creation

After RCA is complete, the incident-handler subagent creates a structured GitHub Issue with incident summary, affected metrics, error traces, root cause, and a fix recommendation — then assigns it directly to @copilot.

⚙️

Subagent Architecture

Two specialized subagents with distinct responsibilities: incident-handler (triage, issue creation, escalation) and code-analyzer (deep code inspection, patch suggestions, PR review).

📚

Knowledge-Grounded Decisions

Two knowledge base documents ground every investigation: the HTTP error runbook (incident response procedures) and the app architecture reference (service topology, dependencies, expected behavior).

sre.azure.com — incident-handler log

08:31:02 [INFO] health-check task started

08:31:04 [INFO] querying backend: GET /actuator/health

08:31:09 [WARN] response time: 7,842ms (threshold: 3,000ms)

08:31:12 [WARN] error rate: 34% (last 10min)

08:31:13 [ALERT] escalating → code-analyzer subagent

08:31:16 [INFO] KQL: querying ContainerAppConsoleLogs...

08:31:21 [INFO] found: Thread.sleep in getAllCreditCards()

08:31:22 [INFO] reading: CreditCardService.java (GitHub)

08:31:28 [ALERT] ROOT CAUSE: artificial sleep(0-9000ms) line 47

08:31:30 [INFO] creating GitHub issue #42...

08:31:31 [FIX] issue assigned to @copilot ✓

08:31:32 [FIX] label: sre-agent-detected applied ✓

08:31:33 [INFO] awaiting Copilot fix PR...▋

Scheduled Proactive Tasks

Running continuously — no alert required to investigate.

Task	Schedule	What it checks	On issue found	Status
three-rivers-health-check	Every 30 min	Backend/frontend health, error rates, response time, container restarts	Escalates to `code-analyzer` + `incident-handler`	Active
three-rivers-config-drift	Every 6 hours	Env vars, container resource limits, image versions vs. expected	Creates GitHub issue via `incident-handler`	Active
daily-reliability-report	Daily 8am UTC	24h metrics summary, 7-day degradation trends, PR correlation	Reliability recommendations posted to GitHub	Active

The Heart of Agentic DevOps

From Defect to Fix —
(Almost) Zero Human Intervention

Watch how a breaking change introduced in production triggers a fully automated pipeline: chaos → detection → RCA → fix → recovery. Humans only review the PR.

🔴 Breaking Change Reaches the Demo App

A GitHub Agentic Workflow (chaos-engineering.lock.yml) autonomously selects a chaos scenario, modifies the target file to introduce a realistic bug, and opens a PR with a plausible commit message. Once merged, CI/CD deploys the broken code to Azure Container Apps.

🌪️ GitHub Agentic Workflow

1/3 Chaos Scenario Selected

2/3 Agentic Workflow Triggered

3/3 Chaos PR Created

🟡 CI/CD Deploys the Defect

The GitHub Actions CD workflow triggers on merge to main. azd up runs Terraform to apply infrastructure changes and deploy updated container images to Azure Container Apps. The defect is now live. Load continues arriving — errors begin accumulating.

⚙️ GitHub Actions CD

🚨 Azure Monitor Alert Fires

Within minutes, Azure Monitor detects the anomaly and fires an alert. For a slow-response scenario, the High Response Time alert triggers (severity 3). For an OOM scenario, the Container Restart alert fires (severity 1). The alert routes directly to the SRE Agent's response plan.

🤖 Azure Monitor → SRE Agent

🔍 SRE Agent Investigates

The incident-handler subagent wakes up and delegates deep analysis to code-analyzer. It queries ContainerAppConsoleLogs via KQL, reads container metrics, then cross-references the GitHub repository to find the exact commit and code change that caused the degradation. Average time to root cause: < 2 minutes.

🤖 SRE Agent — code-analyzer subagent

📋 GitHub Issue Created with Full RCA

The agent creates a structured GitHub Issue containing: incident timeline, affected metrics (response times, error rates), error log excerpts, root cause (exact file + line), and a concrete fix recommendation. The issue is labeled sre-agent-detected and assigned to @copilot.

🤖 SRE Agent — incident-handler subagent

🤖 Copilot Coding Agent Creates Fix PR

Copilot Coding Agent picks up the assigned issue, reads the RCA, performs its own code inspection, and implements the minimal fix. It opens a PR referencing the incident issue — ready for human review. No one had to write the fix; Copilot authored it autonomously based on the SRE report.

✨ GitHub Copilot Coding Agent

👁️ Human Reviews & Approves

An engineer reviews the Copilot-authored PR. The PR links back to the SRE Agent incident issue, making context immediately clear. CI runs, tests pass, and the engineer merges. This is the only human step in the entire incident response pipeline.

👤 Human Review

✅ Production Restored

CD deploys the fix. The SRE Agent's next health check confirms all metrics are normal: response times drop, error rates return to zero, container restarts stop. The agent closes the incident and logs a recovery summary. The load generator's output returns to all-green. 🟢

⚙️ GitHub Actions CD → SRE Agent confirms

GitHub Agentic Workflows

The Chaos Engine —
AI That Breaks Things

GitHub Agentic Workflows (gh-aw) compile natural-language .md prompt files into locked workflows. The chaos engineering workflow uses this to autonomously inject app-breaking changes via PR — simulating how real defects slip through to an environment.

How it works

📝

Write Prompt

chaos-engineering.md

→

⚙️

Compile

gh aw compile

→

🔒

Lock File

.lock.yml (signed)

→

▶️

Trigger

gh workflow run

→

🤖

Agent Executes

Sandboxed + firewalled

→

🔀

Creates PR

chaos/* branch

          
          
          
        
.github/workflows/chaos-engineering.md
---
description: >
  Chaos engineering workflow — introduces breaking changes
  for Azure SRE Agent demo. Creates a PR with a realistic fault.
on:
  workflow_dispatch: {}
tools:
  github:
    toolsets: [default]
safe-outputs:
  create-pull-request:
    max: 1
---
 
# Chaos Engineering — SRE Agent Demo
 
# 1. Search for open issues with chaos-engineering label
# 2. Extract scenario from issue title
# 3. Apply the breaking change to the target file
# 4. Create a PR on chaos/{scenario}-{date} branch
#    with chaos-engineering label

🛡️ Security Model: Each agentic workflow run executes inside a sandboxed container with an egress firewall. Only api.github.com and api.githubcopilot.com are permitted. All other outbound traffic is blocked. Safe outputs (like create-pull-request) are gated with a maximum of 1 PR per run.

AgenticDevOps

Built entirelywith GitHub Copilot

Three Rivers BankCredit Card Portal

Azure SRE AgentNever Sleeps

Scheduled Proactive Tasks

From Defect to Fix —(Almost) Zero Human Intervention

The Chaos Engine —AI That Breaks Things