Skip to content

Grafana Observability Stack

Grafana, Loki, and Promtail run on the home server at 10.1.1.3 and provide centralized log aggregation, metrics, and dashboards for all AnglinAI projects. This document covers how the stack works, how to use the dashboards, how to add a new project, and how to commit configuration changes.


Quick Reference

ThingValue
Grafana URLhttp://10.1.1.3:3000
Loki APIhttp://127.0.0.1:3100 (server-local only)
Config repoLarryAnglin/accessible-org-chart β†’ config/
Server path~/accessible-org-chart/ on [email protected]
Restart commandcd ~/accessible-org-chart && docker compose restart promtail grafana

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Docker containers on 10.1.1.3 β”‚
β”‚ β”‚
β”‚ [org-chart-api] [pdf-converter-api-node-1] [...] β”‚
β”‚ β”‚ stdout β”‚ stdout β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚
β”‚ [Promtail] ← scrapes Docker socket β”‚
β”‚ β”‚ push JSON log streams β”‚
β”‚ [Loki] ← stores + indexes logs β”‚
β”‚ β”‚ query β”‚
β”‚ [Grafana] ← renders dashboards β”‚
β”‚ β”‚ β”‚
β”‚ Port 3000 ← you access this β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Promtail discovers containers via the Docker socket and filters by the logging Docker label. It parses JSON from stdout, extracts Loki stream labels, and pushes log entries to Loki.

Loki stores log lines indexed by stream labels (like service, level). It does not store metrics β€” all numbers are derived from log data at query time using LogQL.

Grafana queries Loki with LogQL expressions and renders the results as dashboards.


Two Promtail Pipelines

There are two scrape configurations, distinguished by the Docker logging label on each container.

logging=loki β€” Structured message pipeline (org-chart)

Used by: accessible-org-chart-api-1

The pipeline extracts level, service, and timestamp as Loki stream labels, then replaces the log line with the message field only. This means only the human-readable message is stored in Loki β€” other metadata fields (like path, durationMs, userId) are discarded.

Grafana queries against this pipeline use message= string filters:

{service="org-chart-api", level="error"}
count_over_time({service="org-chart-api"} | json | message="Request completed" [1m])

logging=loki-json β€” Full JSON pipeline (pdf-converter)

Used by: accessible-pdf-converter-api-node-1, accessible-pdf-converter-api-node-2

The pipeline extracts level, service, and timestamp as Loki stream labels, but preserves the full JSON log line. All fields are queryable in Grafana using | json in LogQL.

Grafana queries can filter and unwrap any field:

{service="pdf-converter-api"} | json | event=`request_completed` | status >= 400
quantile_over_time(0.95, {service="pdf-converter-api"} | json | unwrap durationMs [5m])
sum_over_time({service="pdf-converter-api"} | json | event=`cost_recorded` | unwrap estimatedCostUsd [1h])

What Each Project Emits

accessible-org-chart

The API uses @org-chart/logger with StdoutTransport. Every request goes through the requestLogger middleware.

DataIn Grafana?Notes
Request rateβœ“message="Request completed"
Error rateβœ“level=~"error|fatal"
Request latencyβœ“durationMs in message metadata… but see ⚠️ below
AI costs (extraction)βœ“Dedicated event="cost_recorded" events in extract.ts
Health checksβœ“/health endpoint returns { status: "healthy" }

Cost data is emitted as dedicated event="cost_recorded" log entries (flat JSON, not nested in metadata) so it is directly queryable in Grafana. See the AI Extraction Cost Per Hour and Cost by Model panels in the Org Chart API Overview dashboard.

accessible-pdf-converter

The Node server uses a custom structured JSON middleware in server.ts. All logs are full JSON, using the loki-json pipeline.

DataIn Grafana?Notes
Request rateβœ“event="request_completed"
Error rateβœ“level=~"error|warn" + event="request_completed"
Request latencyβœ“unwrap durationMs
AI costsβœ“event="cost_recorded" with estimatedCostUsd, model, operationType
Health checksβ€”Intentionally excluded from logs to reduce noise

Dashboards

Accessing Grafana

Open http://10.1.1.3:3000 in a browser. Ask Larry for the admin credentials.

Available Dashboards

DashboardServicePanels
API Overvieworg-chart-apiRequest rate, error rate, latency, active users, recent errors
PDF Converter β€” Operationspdf-converter-apiRequest rate, errors, latency, AI cost by model/operation, token usage, recent errors

Time Range

Use the time picker in the top-right corner. Dashboards default to the last 6 hours. Common selections:

  • Last 1h β€” live operational view
  • Last 24h β€” daily summary
  • Last 7d β€” weekly trends

Grafana uses $__range for stat panels (total over the whole selected range) and $__interval for timeseries panels (rate per bucket).

Reading the PDF Converter Dashboard

  • Top row (stats) β€” totals for the selected time range: requests, errors, AI cost, avg response time
  • Request Rate by Endpoint β€” which API paths are being hit most
  • Error Rate Over Time β€” 4xx (warn) and 5xx (error) over time
  • Request Latency β€” p50/p95/p99 in milliseconds; conversions typically take 5–60s
  • AI Cost Per Hour β€” stacked bars by AI model; tallest bars = most expensive period
  • Cost by Model β€” pie chart of total spend broken down by model (Claude, Gemini, etc.)
  • Recent Errors β€” live log panel showing the last error/warning log lines

Writing LogQL Queries

LogQL is Loki’s query language. The basic pattern is:

{stream_selector} | parser | filter | metric

Stream selectors β€” what to match (uses indexed Loki labels):

{service="pdf-converter-api"}
{service="pdf-converter-api", level="error"}
{service=~"pdf-converter-api|org-chart-api"}

Parser β€” how to interpret the log line:

| json # parse full JSON log line (loki-json pipeline containers)

Filters β€” narrow down results after parsing:

| event=`request_completed`
| event=`cost_recorded`
| level="error"
| path="/api/convert"

Metrics β€” aggregate into numbers:

count_over_time(...[5m]) # count log lines per 5m window
sum_over_time(... | unwrap estimatedCostUsd [5m]) # sum a numeric field
quantile_over_time(0.95, ... | unwrap durationMs [5m]) # p95 of a numeric field
avg_over_time(... | unwrap durationMs [5m]) # average of a numeric field

Useful example queries:

# All errors in the last hour
{service="pdf-converter-api", level="error"} | json
# Total AI spend today
sum(sum_over_time({service="pdf-converter-api"} | json | event=`cost_recorded` | unwrap estimatedCostUsd [$__range]))
# Cost by model this week
sum by (model) (sum_over_time({service="pdf-converter-api"} | json | event=`cost_recorded` | unwrap estimatedCostUsd [$__range]))
# 95th percentile latency per minute (excluding health checks)
quantile_over_time(0.95, {service="pdf-converter-api"} | json | event=`request_completed` | unwrap durationMs [$__interval])
# Request rate by path
sum by (path) (count_over_time({service="pdf-converter-api"} | json | event=`request_completed` [$__interval]))

Editing Dashboards

Option 1: Edit in the Grafana UI, then export

  1. Open the dashboard in Grafana at http://10.1.1.3:3000
  2. Click the pencil icon (top right) to enter edit mode
  3. Add, move, or edit panels
  4. When done, click Dashboard settings (gear icon) β†’ JSON Model
  5. Copy the full JSON
  6. Replace the contents of the appropriate file in config/dashboards/:
    • api-overview.json β€” org-chart dashboard
    • pdf-converter.json β€” PDF converter dashboard
  7. Commit and push (see below)

Option 2: Edit the JSON file directly, let Grafana auto-reload

  1. Edit the JSON file in accessible-org-chart/config/dashboards/
  2. Grafana polls the dashboard directory every 30 seconds and picks up changes automatically
  3. Refresh Grafana in your browser to see the changes
  4. Commit and push

Grafana Dashboard JSON key fields

{
"title": "Dashboard Name",
"uid": "unique-id-here", ← used in URLs and API β€” don't change arbitrarily
"refresh": "30s", ← auto-refresh interval
"time": { "from": "now-6h", "to": "now" }, ← default time range
"panels": [ ... ] ← array of panel objects
}

Each panel needs:

  • "type" β€” "timeseries", "stat", "logs", "piechart", "table"
  • "datasource" β€” { "type": "loki", "uid": "loki" }
  • "targets" β€” array of LogQL queries
  • "gridPos" β€” { "x": 0, "y": 0, "w": 12, "h": 8 } (grid is 24 wide)

Committing Configuration Changes

All Grafana/Loki/Promtail config lives in the accessible-org-chart repo under config/. Changes must be committed and pushed to stay backed up.

From your Mac (preferred)

Terminal window
cd ~/Projects/accessible-org-chart
# Edit config/dashboards/*.json or config/promtail.yml
git add config/
git commit -m "Update Grafana dashboard: add cost panels"
git push

Then sync to the server:

Terminal window
ssh -i ~/.ssh/nightly-audit [email protected] "cd ~/accessible-org-chart && git pull"
# Restart if promtail.yml changed:
ssh -i ~/.ssh/nightly-audit [email protected] "cd ~/accessible-org-chart && docker compose restart promtail"

Grafana auto-reloads dashboards β€” no restart needed for dashboard-only changes.

From the server directly (if needed)

Terminal window
ssh -i ~/.ssh/nightly-audit [email protected]
cd ~/accessible-org-chart
# ... make changes ...
git add config/
git commit -m "..."
git pull --rebase # sync any remote changes first
git push

Adding a New Project to Grafana

Step 1 β€” Choose a pipeline

Use logging=lokiUse logging=loki-json
App uses @org-chart/logger or similarApp emits raw structured JSON from console.log
Only need request count + errorsNeed to query cost, latency, or other numeric fields

Step 2 β€” Add Docker labels to the container

In the project’s docker-compose.yml:

services:
my-api:
labels:
- "logging=loki-json" # or "loki" for the message-only pipeline
- "service=my-service-api"

Step 3 β€” Emit structured JSON logs

For the loki-json pipeline, the app must emit JSON lines to stdout. Required fields:

{
"timestamp": "2026-03-01T12:00:00.000Z",
"level": "info",
"service": "my-service-api",
"event": "request_completed",
"method": "POST",
"path": "/api/resource",
"status": 200,
"durationMs": 143
}

For AI cost events:

{
"timestamp": "2026-03-01T12:00:00.000Z",
"level": "info",
"service": "my-service-api",
"event": "cost_recorded",
"model": "claude-sonnet-4-20250514",
"operationType": "conversion",
"inputTokens": 1234,
"outputTokens": 567,
"estimatedCostUsd": 0.0123,
"userId": "user-uuid"
}

See accessible-pdf-converter/workers/api/src/server.ts for the request logging middleware and workers/api/src/services/cost-ledger.ts for cost event logging β€” copy this pattern directly.

Step 4 β€” Create a Grafana dashboard

Copy config/dashboards/pdf-converter.json as a starting point. Change:

  • "title" β€” dashboard name
  • "uid" β€” must be unique across all dashboards
  • All service="pdf-converter-api" references β†’ service="my-service-api"

Commit the new file and pull on the server.

Step 5 β€” Restart containers and Promtail

Terminal window
ssh -i ~/.ssh/nightly-audit [email protected] "
cd ~/my-project && docker compose up -d my-api &&
cd ~/accessible-org-chart && docker compose restart promtail
"

Verify logs reach Loki within 30 seconds:

Terminal window
ssh -i ~/.ssh/nightly-audit [email protected] \
"curl -s 'http://127.0.0.1:3100/loki/api/v1/label/service/values' | python3 -m json.tool"

Infrastructure Notes

Config file locations

FilePurpose
config/promtail.ymlPromtail scrape configs and parsing pipelines
config/loki.ymlLoki storage and retention (30-day default)
config/dashboards/*.jsonGrafana dashboard definitions (auto-provisioned)
config/grafana-provisioning/datasources/datasources.ymlLoki datasource
config/grafana-provisioning/dashboards/dashboards.ymlDashboard auto-load config
config/grafana-provisioning/alerting/alerts.ymlGrafana alert rules

Restarting services

Terminal window
# Restart everything
ssh -i ~/.ssh/nightly-audit [email protected] "cd ~/accessible-org-chart && docker compose restart"
# Restart only Promtail (after promtail.yml changes)
ssh -i ~/.ssh/nightly-audit [email protected] "cd ~/accessible-org-chart && docker compose restart promtail"
# Grafana reloads dashboards automatically every 30s β€” no restart needed for dashboard changes

Checking if data is flowing

Terminal window
# List all services that have sent logs to Loki
ssh -i ~/.ssh/nightly-audit [email protected] \
"curl -s 'http://127.0.0.1:3100/loki/api/v1/label/service/values' | python3 -m json.tool"
# Tail live logs from a container and check JSON format
ssh -i ~/.ssh/nightly-audit [email protected] \
"docker logs accessible-pdf-converter-api-node-1 --tail 20"
# Query Loki directly
ssh -i ~/.ssh/nightly-audit [email protected] "
START=\$(python3 -c 'import time; print(int((time.time()-3600)*1e9))') &&
END=\$(python3 -c 'import time; print(int(time.time()*1e9))') &&
curl -s \"http://127.0.0.1:3100/loki/api/v1/query_range?query=%7Bservice%3D%22pdf-converter-api%22%7D&limit=5&start=\${START}&end=\${END}\" | python3 -m json.tool
"