Skip to main content

Observability

Metrics collection and visualization for Semantic Router using Prometheus and Grafana.


1. Metrics & Endpoints​

ComponentEndpointNotes
Router metrics:9190/metricsPrometheus format (flag: --metrics-port)
Router health:8080/healthHTTP readiness/liveness
Envoy metrics (optional):19000/stats/prometheusIf Envoy is enabled

Configuration location: tools/observability/
Dashboard: tools/observability/llm-router-dashboard.json


2. Local Mode (Router on Host)​

Run router natively on host, observability in Docker.

Quick Start​

# Start router
make run-router

# Start observability
make obs-local

Access:

Verify targets:

# Check Prometheus scrapes localhost:9190
open http://localhost:9090/targets

Stop:

make stop-observability

Configuration​

All configs in tools/observability/:

  • prometheus.yaml - Scrapes localhost:9190 when ROUTER_TARGET=localhost:9190
  • grafana-datasource.yaml - Points to localhost:9090
  • grafana-dashboard.yaml - Dashboard provisioning
  • llm-router-dashboard.json - Dashboard definition

Troubleshooting​

IssueFix
Target DOWNStart router: make run-router
No metricsGenerate traffic, check :9190/metrics
Port conflictChange port or stop conflicting service

3. Docker Compose Mode​

All services in Docker containers.

Quick Start​

# Start full stack (includes observability)
docker compose up --build

# Or with testing profile
docker compose --profile testing up --build

Access:

Expected targets:

  • semantic-router:9190
  • envoy-proxy:19000 (optional)

Configuration​

Same configs as local mode (tools/observability/), but:

  • ROUTER_TARGET=semantic-router:9190
  • PROMETHEUS_URL=prometheus:9090
  • Uses semantic-network bridge network

4. Kubernetes Mode​

Production-ready Prometheus + Grafana for K8s clusters.

Namespace: vllm-semantic-router-system

Components​

ComponentPurposeLocation
PrometheusScrapes router metrics, 15d retentiondeploy/kubernetes/observability/prometheus/
GrafanaDashboard visualizationdeploy/kubernetes/observability/grafana/
IngressOptional external accessdeploy/kubernetes/observability/ingress.yaml

Deploy​

# Apply manifests
kubectl apply -k deploy/kubernetes/observability/

# Verify
kubectl get pods -n vllm-semantic-router-system

Access​

Port-forward:

kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system

Ingress: Customize ingress.yaml with your domain and TLS

Key Configuration​

Prometheus uses Kubernetes service discovery:

scrape_configs:
- job_name: semantic-router
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [vllm-semantic-router-system]

Grafana credentials (change in production):

kubectl create secret generic grafana-admin \
--namespace vllm-semantic-router-system \
--from-literal=admin-user=admin \
--from-literal=admin-password='your-password'

5. Key Metrics​

MetricTypeDescription
llm_category_classifications_countcounterCategory classifications
llm_model_completion_tokens_totalcounterTokens per model
llm_model_routing_modifications_totalcounterModel routing changes
llm_model_completion_latency_secondshistogramCompletion latency

Example queries:

rate(llm_model_completion_tokens_total[5m])
histogram_quantile(0.95, rate(llm_model_completion_latency_seconds_bucket[5m]))

6. Troubleshooting​

IssueCheckFix
Target DOWNPrometheus /targetsVerify router is running and exposing :9190/metrics
No metricsGenerate trafficSend requests through router
Dashboard emptyGrafana datasourceCheck Prometheus URL configuration