Health Monitoring Setup

Overview

Risk Legion includes built-in health monitoring endpoints and can integrate with external monitoring tools for comprehensive observability.

Built-in Health Checks

Health Endpoint

curl https://api.risklegion.com/health

Response:

{
  "status": "healthy",
  "timestamp": "2026-01-16T10:30:00Z",
  "version": "1.0.0",
  "components": {
    "api": "healthy",
    "database": "healthy",
    "redis": "healthy"
  }
}

Docker Health Check

Configured in Dockerfile:

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

Uptime Monitoring

Using UptimeRobot

Create account at uptimerobot.com
Add monitors:

Monitor	URL	Interval
API Health	`https://api.risklegion.com/health`	5 min
Frontend	`https://app.risklegion.com`	5 min

Configure alerts (email, Slack, etc.)

Using Better Uptime

Create account at betteruptime.com
Add heartbeat monitors
Configure status page (optional)

Error Tracking

Sentry Integration

Backend Setup

# app/main.py
import sentry_sdk
from sentry_sdk.integrations.fastapi import FastApiIntegration

sentry_sdk.init(
    dsn=settings.SENTRY_DSN,
    environment=settings.ENVIRONMENT,
    integrations=[FastApiIntegration()],
    traces_sample_rate=0.1,
)

Frontend Setup

// src/main.tsx
import * as Sentry from '@sentry/react';

Sentry.init({
  dsn: import.meta.env.VITE_SENTRY_DSN,
  environment: import.meta.env.VITE_ENVIRONMENT,
  tracesSampleRate: 0.1,
});

Alert Configuration

In Sentry Dashboard:

Go to Alerts
Create alert rules for:
- Error spike detection
- New issue alerts
- Performance regression

Application Metrics

Prometheus Integration

Backend Metrics

# app/middleware/metrics.py
from prometheus_client import Counter, Histogram, generate_latest

REQUEST_COUNT = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

REQUEST_LATENCY = Histogram(
    'http_request_duration_seconds',
    'HTTP request latency',
    ['method', 'endpoint']
)

@app.get("/metrics")
async def metrics():
    return Response(generate_latest(), media_type="text/plain")

Prometheus Configuration

# prometheus.yml
scrape_configs:
  - job_name: 'risk-legion-api'
    static_configs:
      - targets: ['api.risklegion.com:8000']
    metrics_path: /metrics
    scrape_interval: 15s

Grafana Dashboards

Create dashboards for:

Request rate and latency
Error rate
Database query performance
Redis cache hit rate

Log Aggregation

Structured Logging

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()
logger.info("request_processed", path="/api/v1/bras", duration_ms=45)

Log Output

{
  "timestamp": "2026-01-16T10:30:00.123Z",
  "level": "info",
  "event": "request_processed",
  "path": "/api/v1/bras",
  "duration_ms": 45
}

Log Forwarding

Forward logs to:

AWS CloudWatch: Native EC2 integration
Datadog: Via agent or API
ELK Stack: Via Filebeat

Alerting

Alert Types

Alert	Trigger	Severity
API Down	Health check fails 3x	Critical
High Error Rate	>5% 5xx errors for 5 min	High
High Latency	P95 >2s for 5 min	Warning
Database Issues	Connection failures	Critical

Notification Channels

Configure alerts via:

Email
Slack
PagerDuty
SMS (Twilio)

Example Slack Alert

{
  "text": "🚨 Risk Legion Alert",
  "blocks": [
    {
      "type": "section",
      "text": {
        "type": "mrkdwn",
        "text": "*Alert:* High Error Rate\n*Environment:* Production\n*Details:* Error rate exceeded 5% threshold"
      }
    }
  ]
}

Status Page

Using Atlassian Statuspage

Create page at statuspage.io
Add components:
- API
- Web Application
- Database
- Authentication
Configure automation with API

Self-Hosted Option

Use Upptime for GitHub-powered status page.

Runbooks

API Unresponsive

Check EC2 instance status
SSH to instance
Check Docker container: docker ps
Check container logs: docker logs risk-legion-api
Restart if needed: docker restart risk-legion-api

Database Connection Issues

Check Supabase status
Verify DATABASE_URL is correct
Check connection pool status
Restart application to reset connections

High Latency

Check slow query logs
Review recent deployments
Check resource usage: docker stats
Scale resources if needed

Checklist

Basic Monitoring

Health endpoint configured
Uptime monitor active
Alert notifications set up

Error Tracking

Sentry configured
Alert rules created
On-call rotation defined

Metrics

Prometheus/metrics endpoint
Grafana dashboards
Performance baselines set

Logging

Structured logging enabled
Log aggregation configured
Log retention policy set

Environment Variables Cron Jobs

⌘I

Setup

Production

Configuration

​Overview

​Built-in Health Checks

​Health Endpoint

​Docker Health Check

​Uptime Monitoring

​Using UptimeRobot

​Using Better Uptime

​Error Tracking

​Sentry Integration

​Backend Setup

​Frontend Setup

​Alert Configuration

​Application Metrics

​Prometheus Integration

​Backend Metrics

​Prometheus Configuration

​Grafana Dashboards

​Log Aggregation

​Structured Logging

​Log Output

​Log Forwarding

​Alerting

​Alert Types

​Notification Channels

​Example Slack Alert

​Status Page

​Using Atlassian Statuspage

​Self-Hosted Option

​Runbooks

​API Unresponsive

​Database Connection Issues

​High Latency

​Checklist

Overview

Built-in Health Checks

Health Endpoint

Docker Health Check

Uptime Monitoring

Using UptimeRobot

Using Better Uptime

Error Tracking

Sentry Integration

Backend Setup

Frontend Setup

Alert Configuration

Application Metrics

Prometheus Integration

Backend Metrics

Prometheus Configuration

Grafana Dashboards

Log Aggregation

Structured Logging

Log Output

Log Forwarding

Alerting

Alert Types

Notification Channels

Example Slack Alert

Status Page

Using Atlassian Statuspage

Self-Hosted Option

Runbooks

API Unresponsive

Database Connection Issues

High Latency

Checklist