1# Logfire Querying Guide 2 3## Basic Concepts 4 5Logfire uses PostgreSQL-flavored SQL to query trace data. All data is stored in the `records` table. 6 7## Key Fields 8 9- `deployment_environment` - environment name (local/staging/production) - **ALWAYS filter by this when investigating issues** 10- `is_exception` - boolean field to filter spans that contain exceptions 11- `kind` - either 'span' or 'event' 12- `trace_id` - unique identifier for a trace (group of related spans) 13- `span_id` - unique identifier for this specific span 14- `span_name` - name of the span (e.g., "GET /tracks/") 15- `message` - human-readable message 16- `attributes` - JSONB field containing span metadata, exception details, etc. 17- `start_timestamp` - when the span started 18- `duration` - span duration (in seconds, multiply by 1000 for ms) 19- `otel_status_code` - OpenTelemetry status (OK, ERROR, UNSET) 20- `otel_status_message` - **IMPORTANT: Error messages appear here, not in exception.message** 21 22## Querying for Exceptions 23 24**Find recent exception spans:** 25```sql 26SELECT 27 message, 28 start_timestamp, 29 otel_status_message, 30 attributes->>'exception.type' as exc_type 31FROM records 32WHERE is_exception = true 33ORDER BY start_timestamp DESC 34LIMIT 10 35``` 36 37**Get exception details with context:** 38```sql 39SELECT 40 message, 41 otel_status_message as error_summary, 42 attributes->>'exception.type' as exc_type, 43 attributes->>'exception.message' as exc_msg, 44 attributes->>'exception.stacktrace' as stacktrace, 45 start_timestamp 46FROM records 47WHERE is_exception = true 48ORDER BY start_timestamp DESC 49LIMIT 5 50``` 51 52**Find all spans in a trace:** 53```sql 54SELECT message, kind, start_timestamp, duration 55FROM records 56WHERE trace_id = '<trace-id-here>' 57ORDER BY start_timestamp 58``` 59 60## HTTP Request Queries 61 62**Recent HTTP requests with status codes:** 63```sql 64SELECT 65 span_name, 66 start_timestamp, 67 duration * 1000 as duration_ms, 68 (attributes->>'http.status_code')::int as status_code, 69 attributes->>'http.route' as route, 70 otel_status_code 71FROM records 72WHERE kind = 'span' AND span_name LIKE 'GET%' 73ORDER BY start_timestamp DESC 74LIMIT 20 75``` 76 77**Find slow or failed HTTP requests:** 78```sql 79SELECT 80 span_name, 81 start_timestamp, 82 duration * 1000 as duration_ms, 83 (attributes->>'http.status_code')::int as status_code, 84 otel_status_message 85FROM records 86WHERE kind = 'span' 87 AND span_name LIKE 'GET%' 88 AND (duration > 1.0 OR otel_status_code = 'ERROR') 89ORDER BY duration DESC 90LIMIT 20 91``` 92 93**Understanding 307 Redirects:** 94 95When querying for audio streaming requests, you'll see HTTP 307 (Temporary Redirect) responses: 96 97```sql 98SELECT 99 span_name, 100 message, 101 (attributes->>'http.status_code')::int as status_code, 102 attributes->>'http.url' as url 103FROM records 104WHERE span_name = 'GET /audio/{file_id}' 105 AND (attributes->>'http.status_code')::int = 307 106ORDER BY start_timestamp DESC 107``` 108 109**Why 307 is expected:** 110- The `/audio/{file_id}` endpoint redirects to Cloudflare R2 CDN URLs when using R2 storage 111- 307 preserves the GET method during redirect (unlike 302) 112- This offloads bandwidth to R2's CDN instead of proxying through the app 113- See `src/backend/api/audio.py` for implementation 114 115## Database Query Spans 116 117**Find slow database queries:** 118```sql 119SELECT 120 span_name, 121 start_timestamp, 122 duration * 1000 as duration_ms, 123 attributes->>'db.statement' as query, 124 trace_id 125FROM records 126WHERE span_name LIKE 'SELECT%' 127 AND duration > 0.1 128ORDER BY duration DESC 129LIMIT 10 130``` 131 132**Database query patterns:** 133```sql 134-- group queries by type 135SELECT 136 CASE 137 WHEN span_name LIKE 'SELECT%' THEN 'SELECT' 138 WHEN span_name LIKE 'INSERT%' THEN 'INSERT' 139 WHEN span_name LIKE 'UPDATE%' THEN 'UPDATE' 140 WHEN span_name LIKE 'DELETE%' THEN 'DELETE' 141 ELSE 'OTHER' 142 END as query_type, 143 COUNT(*) as count, 144 AVG(duration * 1000) as avg_duration_ms, 145 MAX(duration * 1000) as max_duration_ms 146FROM records 147WHERE span_name LIKE '%FROM%' OR span_name LIKE '%INTO%' 148GROUP BY query_type 149ORDER BY count DESC 150``` 151 152## Background Task and Storage Queries 153 154**Search by message content:** 155```sql 156SELECT 157 span_name, 158 message, 159 start_timestamp, 160 attributes 161FROM records 162WHERE message LIKE '%R2%' OR message LIKE '%upload%' 163ORDER BY start_timestamp DESC 164LIMIT 10 165``` 166 167**Get full trace for a background task:** 168```sql 169-- First, find the trace_id for your background task 170SELECT trace_id, message, start_timestamp 171FROM records 172WHERE span_name = 'process upload background' 173ORDER BY start_timestamp DESC 174LIMIT 1; 175 176-- Then get all spans in that trace 177SELECT 178 span_name, 179 message, 180 start_timestamp, 181 duration * 1000 as duration_ms 182FROM records 183WHERE trace_id = '<trace-id-from-above>' 184ORDER BY start_timestamp ASC; 185``` 186 187**Extract nested attributes from JSONB:** 188```sql 189-- Get bucket and key from R2 upload logs 190SELECT 191 message, 192 attributes->>'bucket' as bucket, 193 attributes->>'key' as key, 194 attributes->>'file_id' as file_id, 195 start_timestamp 196FROM records 197WHERE message = 'uploading to R2' 198ORDER BY start_timestamp DESC 199LIMIT 5; 200``` 201 202**Find spans within a time range:** 203```sql 204SELECT 205 span_name, 206 message, 207 start_timestamp, 208 duration * 1000 as duration_ms 209FROM records 210WHERE start_timestamp > '2025-11-11T04:56:50Z' 211 AND start_timestamp < '2025-11-11T04:57:10Z' 212 AND (span_name LIKE '%R2%' OR message LIKE '%save%') 213ORDER BY start_timestamp ASC; 214``` 215 216**Common mistake: Not all log levels create spans** 217 218When using `logfire.info()`, these create log events, not spans. To find them: 219- Search by `message` field, not `span_name` 220- Use LIKE with wildcards: `message LIKE '%preparing%'` 221- Filter by `kind = 'event'` if you only want logs (not spans) 222 223Example: 224```sql 225-- WRONG: This won't find logfire.info() calls 226SELECT * FROM records WHERE span_name = 'preparing to save audio file'; 227 228-- RIGHT: Search by message instead 229SELECT * FROM records WHERE message LIKE '%preparing%'; 230``` 231 232**Aggregate errors by type:** 233```sql 234SELECT 235 attributes->>'exception.type' as error_type, 236 COUNT(*) as occurrences, 237 MAX(start_timestamp) as last_seen, 238 COUNT(DISTINCT trace_id) as unique_traces 239FROM records 240WHERE is_exception = true 241 AND start_timestamp > NOW() - INTERVAL '24 hours' 242GROUP BY error_type 243ORDER BY occurrences DESC 244``` 245 246**Find errors by endpoint:** 247```sql 248SELECT 249 attributes->>'http.route' as endpoint, 250 COUNT(*) as error_count, 251 COUNT(DISTINCT attributes->>'exception.type') as unique_error_types 252FROM records 253WHERE otel_status_code = 'ERROR' 254 AND attributes->>'http.route' IS NOT NULL 255GROUP BY endpoint 256ORDER BY error_count DESC 257``` 258 259## Known Issues 260 261### `/tracks/` 500 Error on First Load 262 263**Trace ID:** `019a46fe0b20c24432f5a7536d8561a6` 264**Timestamp:** 2025-11-02T23:54:05.472754Z 265**Status:** 500 266 267**Symptoms:** 268- First request to `GET /tracks/` fails with 500 error 269- Subsequent requests succeed with 200 status 270- Database connection and SELECT query both execute successfully in the trace 271 272**Root Cause:** 273SSL connection pooling issue with Neon database. The error appears in `otel_status_message`: 274``` 275consuming input failed: SSL connection has been closed unexpectedly 276``` 277 278**Analysis:** 279- This is a Neon PostgreSQL connection pool issue where SSL connections are being dropped 280- First request attempts to use a stale/closed SSL connection from the pool 281- Subsequent requests work because the pool recovers and establishes a fresh connection 282- The error is captured in `otel_status_code: "ERROR"` and `otel_status_message` fields 283 284**Potential Fixes:** 2851. Configure SQLAlchemy connection pool settings for Neon: 286 - Set `pool_pre_ping=True` to verify connections before use 287 - Adjust `pool_recycle` to match Neon's connection timeout 2882. Review Neon-specific SSL connection settings 2893. Add retry logic for initial database connections 2904. Consider connection pool size tuning 291 292## Pre-built Dashboards 293 294### Database Performance Dashboard 295 296comprehensive database query analysis showing: 297- query performance by type and table 298- latency percentiles (p50, p95, p99) 299- error rates and counts 300- query volume and impact 301 302see `logfire-database-dashboard.sql` for the full query with alternative views for: 303- slowest individual queries 304- hourly query volume timeline 305- transaction and connection pool metrics 306 307## Resources 308 309- [Logfire SQL Explorer Documentation](https://logfire.pydantic.dev/docs/guides/web-ui/explore/) 310- [Logfire Concepts](https://logfire.pydantic.dev/docs/concepts/) 311- Logfire UI: https://logfire.pydantic.dev/zzstoatzz/plyr (project name configured in logfire.configure)