Grafana Observability Stack
Grafana, Loki, and Promtail run on the home server at 10.1.1.3 and provide centralized log aggregation, metrics, and dashboards for all AnglinAI projects. This document covers how the stack works, how to use the dashboards, how to add a new project, and how to commit configuration changes.
Quick Reference
| Thing | Value |
|---|---|
| Grafana URL | http://10.1.1.3:3000 |
| Loki API | http://127.0.0.1:3100 (server-local only) |
| Config repo | LarryAnglin/accessible-org-chart β config/ |
| Server path | ~/accessible-org-chart/ on [email protected] |
| Restart command | cd ~/accessible-org-chart && docker compose restart promtail grafana |
Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Docker containers on 10.1.1.3 ββ ββ [org-chart-api] [pdf-converter-api-node-1] [...] ββ β stdout β stdout ββ ββββββββββββββββββββ ββ β ββ [Promtail] β scrapes Docker socket ββ β push JSON log streams ββ [Loki] β stores + indexes logs ββ β query ββ [Grafana] β renders dashboards ββ β ββ Port 3000 β you access this ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββPromtail discovers containers via the Docker socket and filters by the logging Docker label. It parses JSON from stdout, extracts Loki stream labels, and pushes log entries to Loki.
Loki stores log lines indexed by stream labels (like service, level). It does not store metrics β all numbers are derived from log data at query time using LogQL.
Grafana queries Loki with LogQL expressions and renders the results as dashboards.
Two Promtail Pipelines
There are two scrape configurations, distinguished by the Docker logging label on each container.
logging=loki β Structured message pipeline (org-chart)
Used by: accessible-org-chart-api-1
The pipeline extracts level, service, and timestamp as Loki stream labels, then replaces the log line with the message field only. This means only the human-readable message is stored in Loki β other metadata fields (like path, durationMs, userId) are discarded.
Grafana queries against this pipeline use message= string filters:
{service="org-chart-api", level="error"}count_over_time({service="org-chart-api"} | json | message="Request completed" [1m])logging=loki-json β Full JSON pipeline (pdf-converter)
Used by: accessible-pdf-converter-api-node-1, accessible-pdf-converter-api-node-2
The pipeline extracts level, service, and timestamp as Loki stream labels, but preserves the full JSON log line. All fields are queryable in Grafana using | json in LogQL.
Grafana queries can filter and unwrap any field:
{service="pdf-converter-api"} | json | event=`request_completed` | status >= 400quantile_over_time(0.95, {service="pdf-converter-api"} | json | unwrap durationMs [5m])sum_over_time({service="pdf-converter-api"} | json | event=`cost_recorded` | unwrap estimatedCostUsd [1h])What Each Project Emits
accessible-org-chart
The API uses @org-chart/logger with StdoutTransport. Every request goes through the requestLogger middleware.
| Data | In Grafana? | Notes |
|---|---|---|
| Request rate | β | message="Request completed" |
| Error rate | β | level=~"error|fatal" |
| Request latency | β | durationMs in message metadataβ¦ but see β οΈ below |
| AI costs (extraction) | β | Dedicated event="cost_recorded" events in extract.ts |
| Health checks | β | /health endpoint returns { status: "healthy" } |
Cost data is emitted as dedicated event="cost_recorded" log entries (flat JSON, not nested in metadata) so it is directly queryable in Grafana. See the AI Extraction Cost Per Hour and Cost by Model panels in the Org Chart API Overview dashboard.
accessible-pdf-converter
The Node server uses a custom structured JSON middleware in server.ts. All logs are full JSON, using the loki-json pipeline.
| Data | In Grafana? | Notes |
|---|---|---|
| Request rate | β | event="request_completed" |
| Error rate | β | level=~"error|warn" + event="request_completed" |
| Request latency | β | unwrap durationMs |
| AI costs | β | event="cost_recorded" with estimatedCostUsd, model, operationType |
| Health checks | β | Intentionally excluded from logs to reduce noise |
Dashboards
Accessing Grafana
Open http://10.1.1.3:3000 in a browser. Ask Larry for the admin credentials.
Available Dashboards
| Dashboard | Service | Panels |
|---|---|---|
| API Overview | org-chart-api | Request rate, error rate, latency, active users, recent errors |
| PDF Converter β Operations | pdf-converter-api | Request rate, errors, latency, AI cost by model/operation, token usage, recent errors |
Time Range
Use the time picker in the top-right corner. Dashboards default to the last 6 hours. Common selections:
- Last 1h β live operational view
- Last 24h β daily summary
- Last 7d β weekly trends
Grafana uses $__range for stat panels (total over the whole selected range) and $__interval for timeseries panels (rate per bucket).
Reading the PDF Converter Dashboard
- Top row (stats) β totals for the selected time range: requests, errors, AI cost, avg response time
- Request Rate by Endpoint β which API paths are being hit most
- Error Rate Over Time β 4xx (warn) and 5xx (error) over time
- Request Latency β p50/p95/p99 in milliseconds; conversions typically take 5β60s
- AI Cost Per Hour β stacked bars by AI model; tallest bars = most expensive period
- Cost by Model β pie chart of total spend broken down by model (Claude, Gemini, etc.)
- Recent Errors β live log panel showing the last error/warning log lines
Writing LogQL Queries
LogQL is Lokiβs query language. The basic pattern is:
{stream_selector} | parser | filter | metricStream selectors β what to match (uses indexed Loki labels):
{service="pdf-converter-api"}{service="pdf-converter-api", level="error"}{service=~"pdf-converter-api|org-chart-api"}Parser β how to interpret the log line:
| json # parse full JSON log line (loki-json pipeline containers)Filters β narrow down results after parsing:
| event=`request_completed`| event=`cost_recorded`| level="error"| path="/api/convert"Metrics β aggregate into numbers:
count_over_time(...[5m]) # count log lines per 5m windowsum_over_time(... | unwrap estimatedCostUsd [5m]) # sum a numeric fieldquantile_over_time(0.95, ... | unwrap durationMs [5m]) # p95 of a numeric fieldavg_over_time(... | unwrap durationMs [5m]) # average of a numeric fieldUseful example queries:
# All errors in the last hour{service="pdf-converter-api", level="error"} | json
# Total AI spend todaysum(sum_over_time({service="pdf-converter-api"} | json | event=`cost_recorded` | unwrap estimatedCostUsd [$__range]))
# Cost by model this weeksum by (model) (sum_over_time({service="pdf-converter-api"} | json | event=`cost_recorded` | unwrap estimatedCostUsd [$__range]))
# 95th percentile latency per minute (excluding health checks)quantile_over_time(0.95, {service="pdf-converter-api"} | json | event=`request_completed` | unwrap durationMs [$__interval])
# Request rate by pathsum by (path) (count_over_time({service="pdf-converter-api"} | json | event=`request_completed` [$__interval]))Editing Dashboards
Option 1: Edit in the Grafana UI, then export
- Open the dashboard in Grafana at
http://10.1.1.3:3000 - Click the pencil icon (top right) to enter edit mode
- Add, move, or edit panels
- When done, click Dashboard settings (gear icon) β JSON Model
- Copy the full JSON
- Replace the contents of the appropriate file in
config/dashboards/:api-overview.jsonβ org-chart dashboardpdf-converter.jsonβ PDF converter dashboard
- Commit and push (see below)
Option 2: Edit the JSON file directly, let Grafana auto-reload
- Edit the JSON file in
accessible-org-chart/config/dashboards/ - Grafana polls the dashboard directory every 30 seconds and picks up changes automatically
- Refresh Grafana in your browser to see the changes
- Commit and push
Grafana Dashboard JSON key fields
{ "title": "Dashboard Name", "uid": "unique-id-here", β used in URLs and API β don't change arbitrarily "refresh": "30s", β auto-refresh interval "time": { "from": "now-6h", "to": "now" }, β default time range "panels": [ ... ] β array of panel objects}Each panel needs:
"type"β"timeseries","stat","logs","piechart","table""datasource"β{ "type": "loki", "uid": "loki" }"targets"β array of LogQL queries"gridPos"β{ "x": 0, "y": 0, "w": 12, "h": 8 }(grid is 24 wide)
Committing Configuration Changes
All Grafana/Loki/Promtail config lives in the accessible-org-chart repo under config/. Changes must be committed and pushed to stay backed up.
From your Mac (preferred)
cd ~/Projects/accessible-org-chart# Edit config/dashboards/*.json or config/promtail.ymlgit add config/git commit -m "Update Grafana dashboard: add cost panels"git pushThen sync to the server:
# Restart if promtail.yml changed:ssh -i ~/.ssh/nightly-audit [email protected] "cd ~/accessible-org-chart && docker compose restart promtail"Grafana auto-reloads dashboards β no restart needed for dashboard-only changes.
From the server directly (if needed)
cd ~/accessible-org-chart# ... make changes ...git add config/git commit -m "..."git pull --rebase # sync any remote changes firstgit pushAdding a New Project to Grafana
Step 1 β Choose a pipeline
Use logging=loki | Use logging=loki-json |
|---|---|
App uses @org-chart/logger or similar | App emits raw structured JSON from console.log |
| Only need request count + errors | Need to query cost, latency, or other numeric fields |
Step 2 β Add Docker labels to the container
In the projectβs docker-compose.yml:
services: my-api: labels: - "logging=loki-json" # or "loki" for the message-only pipeline - "service=my-service-api"Step 3 β Emit structured JSON logs
For the loki-json pipeline, the app must emit JSON lines to stdout. Required fields:
{ "timestamp": "2026-03-01T12:00:00.000Z", "level": "info", "service": "my-service-api", "event": "request_completed", "method": "POST", "path": "/api/resource", "status": 200, "durationMs": 143}For AI cost events:
{ "timestamp": "2026-03-01T12:00:00.000Z", "level": "info", "service": "my-service-api", "event": "cost_recorded", "model": "claude-sonnet-4-20250514", "operationType": "conversion", "inputTokens": 1234, "outputTokens": 567, "estimatedCostUsd": 0.0123, "userId": "user-uuid"}See accessible-pdf-converter/workers/api/src/server.ts for the request logging middleware and workers/api/src/services/cost-ledger.ts for cost event logging β copy this pattern directly.
Step 4 β Create a Grafana dashboard
Copy config/dashboards/pdf-converter.json as a starting point. Change:
"title"β dashboard name"uid"β must be unique across all dashboards- All
service="pdf-converter-api"references βservice="my-service-api"
Commit the new file and pull on the server.
Step 5 β Restart containers and Promtail
cd ~/my-project && docker compose up -d my-api && cd ~/accessible-org-chart && docker compose restart promtail"Verify logs reach Loki within 30 seconds:
"curl -s 'http://127.0.0.1:3100/loki/api/v1/label/service/values' | python3 -m json.tool"Infrastructure Notes
Config file locations
| File | Purpose |
|---|---|
config/promtail.yml | Promtail scrape configs and parsing pipelines |
config/loki.yml | Loki storage and retention (30-day default) |
config/dashboards/*.json | Grafana dashboard definitions (auto-provisioned) |
config/grafana-provisioning/datasources/datasources.yml | Loki datasource |
config/grafana-provisioning/dashboards/dashboards.yml | Dashboard auto-load config |
config/grafana-provisioning/alerting/alerts.yml | Grafana alert rules |
Restarting services
# Restart everything
# Restart only Promtail (after promtail.yml changes)ssh -i ~/.ssh/nightly-audit [email protected] "cd ~/accessible-org-chart && docker compose restart promtail"
# Grafana reloads dashboards automatically every 30s β no restart needed for dashboard changesChecking if data is flowing
# List all services that have sent logs to Loki "curl -s 'http://127.0.0.1:3100/loki/api/v1/label/service/values' | python3 -m json.tool"
# Tail live logs from a container and check JSON format "docker logs accessible-pdf-converter-api-node-1 --tail 20"
# Query Loki directly START=\$(python3 -c 'import time; print(int((time.time()-3600)*1e9))') && END=\$(python3 -c 'import time; print(int(time.time()*1e9))') && curl -s \"http://127.0.0.1:3100/loki/api/v1/query_range?query=%7Bservice%3D%22pdf-converter-api%22%7D&limit=5&start=\${START}&end=\${END}\" | python3 -m json.tool"