music on atproto
plyr.fm
1# Logfire Querying Guide
2
3## Basic Concepts
4
5Logfire uses PostgreSQL-flavored SQL to query trace data. All data is stored in the `records` table.
6
7## Key Fields
8
9- `deployment_environment` - environment name (local/staging/production) - **ALWAYS filter by this when investigating issues**
10- `is_exception` - boolean field to filter spans that contain exceptions
11- `kind` - either 'span' or 'event'
12- `trace_id` - unique identifier for a trace (group of related spans)
13- `span_id` - unique identifier for this specific span
14- `span_name` - name of the span (e.g., "GET /tracks/")
15- `message` - human-readable message
16- `attributes` - JSONB field containing span metadata, exception details, etc.
17- `start_timestamp` - when the span started
18- `duration` - span duration (in seconds, multiply by 1000 for ms)
19- `otel_status_code` - OpenTelemetry status (OK, ERROR, UNSET)
20- `otel_status_message` - **IMPORTANT: Error messages appear here, not in exception.message**
21
22## Querying for Exceptions
23
24**Find recent exception spans:**
25```sql
26SELECT
27 message,
28 start_timestamp,
29 otel_status_message,
30 attributes->>'exception.type' as exc_type
31FROM records
32WHERE is_exception = true
33ORDER BY start_timestamp DESC
34LIMIT 10
35```
36
37**Get exception details with context:**
38```sql
39SELECT
40 message,
41 otel_status_message as error_summary,
42 attributes->>'exception.type' as exc_type,
43 attributes->>'exception.message' as exc_msg,
44 attributes->>'exception.stacktrace' as stacktrace,
45 start_timestamp
46FROM records
47WHERE is_exception = true
48ORDER BY start_timestamp DESC
49LIMIT 5
50```
51
52**Find all spans in a trace:**
53```sql
54SELECT message, kind, start_timestamp, duration
55FROM records
56WHERE trace_id = '<trace-id-here>'
57ORDER BY start_timestamp
58```
59
60## HTTP Request Queries
61
62**Recent HTTP requests with status codes:**
63```sql
64SELECT
65 span_name,
66 start_timestamp,
67 duration * 1000 as duration_ms,
68 (attributes->>'http.status_code')::int as status_code,
69 attributes->>'http.route' as route,
70 otel_status_code
71FROM records
72WHERE kind = 'span' AND span_name LIKE 'GET%'
73ORDER BY start_timestamp DESC
74LIMIT 20
75```
76
77**Find slow or failed HTTP requests:**
78```sql
79SELECT
80 span_name,
81 start_timestamp,
82 duration * 1000 as duration_ms,
83 (attributes->>'http.status_code')::int as status_code,
84 otel_status_message
85FROM records
86WHERE kind = 'span'
87 AND span_name LIKE 'GET%'
88 AND (duration > 1.0 OR otel_status_code = 'ERROR')
89ORDER BY duration DESC
90LIMIT 20
91```
92
93**Understanding 307 Redirects:**
94
95When querying for audio streaming requests, you'll see HTTP 307 (Temporary Redirect) responses:
96
97```sql
98SELECT
99 span_name,
100 message,
101 (attributes->>'http.status_code')::int as status_code,
102 attributes->>'http.url' as url
103FROM records
104WHERE span_name = 'GET /audio/{file_id}'
105 AND (attributes->>'http.status_code')::int = 307
106ORDER BY start_timestamp DESC
107```
108
109**Why 307 is expected:**
110- The `/audio/{file_id}` endpoint redirects to Cloudflare R2 CDN URLs when using R2 storage
111- 307 preserves the GET method during redirect (unlike 302)
112- This offloads bandwidth to R2's CDN instead of proxying through the app
113- See `src/backend/api/audio.py` for implementation
114
115## Database Query Spans
116
117**Find slow database queries:**
118```sql
119SELECT
120 span_name,
121 start_timestamp,
122 duration * 1000 as duration_ms,
123 attributes->>'db.statement' as query,
124 trace_id
125FROM records
126WHERE span_name LIKE 'SELECT%'
127 AND duration > 0.1
128ORDER BY duration DESC
129LIMIT 10
130```
131
132**Database query patterns:**
133```sql
134-- group queries by type
135SELECT
136 CASE
137 WHEN span_name LIKE 'SELECT%' THEN 'SELECT'
138 WHEN span_name LIKE 'INSERT%' THEN 'INSERT'
139 WHEN span_name LIKE 'UPDATE%' THEN 'UPDATE'
140 WHEN span_name LIKE 'DELETE%' THEN 'DELETE'
141 ELSE 'OTHER'
142 END as query_type,
143 COUNT(*) as count,
144 AVG(duration * 1000) as avg_duration_ms,
145 MAX(duration * 1000) as max_duration_ms
146FROM records
147WHERE span_name LIKE '%FROM%' OR span_name LIKE '%INTO%'
148GROUP BY query_type
149ORDER BY count DESC
150```
151
152## Background Task and Storage Queries
153
154**Search by message content:**
155```sql
156SELECT
157 span_name,
158 message,
159 start_timestamp,
160 attributes
161FROM records
162WHERE message LIKE '%R2%' OR message LIKE '%upload%'
163ORDER BY start_timestamp DESC
164LIMIT 10
165```
166
167**Get full trace for a background task:**
168```sql
169-- First, find the trace_id for your background task
170SELECT trace_id, message, start_timestamp
171FROM records
172WHERE span_name = 'process upload background'
173ORDER BY start_timestamp DESC
174LIMIT 1;
175
176-- Then get all spans in that trace
177SELECT
178 span_name,
179 message,
180 start_timestamp,
181 duration * 1000 as duration_ms
182FROM records
183WHERE trace_id = '<trace-id-from-above>'
184ORDER BY start_timestamp ASC;
185```
186
187**Extract nested attributes from JSONB:**
188```sql
189-- Get bucket and key from R2 upload logs
190SELECT
191 message,
192 attributes->>'bucket' as bucket,
193 attributes->>'key' as key,
194 attributes->>'file_id' as file_id,
195 start_timestamp
196FROM records
197WHERE message = 'uploading to R2'
198ORDER BY start_timestamp DESC
199LIMIT 5;
200```
201
202**Find spans within a time range:**
203```sql
204SELECT
205 span_name,
206 message,
207 start_timestamp,
208 duration * 1000 as duration_ms
209FROM records
210WHERE start_timestamp > '2025-11-11T04:56:50Z'
211 AND start_timestamp < '2025-11-11T04:57:10Z'
212 AND (span_name LIKE '%R2%' OR message LIKE '%save%')
213ORDER BY start_timestamp ASC;
214```
215
216**Common mistake: Not all log levels create spans**
217
218When using `logfire.info()`, these create log events, not spans. To find them:
219- Search by `message` field, not `span_name`
220- Use LIKE with wildcards: `message LIKE '%preparing%'`
221- Filter by `kind = 'event'` if you only want logs (not spans)
222
223Example:
224```sql
225-- WRONG: This won't find logfire.info() calls
226SELECT * FROM records WHERE span_name = 'preparing to save audio file';
227
228-- RIGHT: Search by message instead
229SELECT * FROM records WHERE message LIKE '%preparing%';
230```
231
232**Aggregate errors by type:**
233```sql
234SELECT
235 attributes->>'exception.type' as error_type,
236 COUNT(*) as occurrences,
237 MAX(start_timestamp) as last_seen,
238 COUNT(DISTINCT trace_id) as unique_traces
239FROM records
240WHERE is_exception = true
241 AND start_timestamp > NOW() - INTERVAL '24 hours'
242GROUP BY error_type
243ORDER BY occurrences DESC
244```
245
246**Find errors by endpoint:**
247```sql
248SELECT
249 attributes->>'http.route' as endpoint,
250 COUNT(*) as error_count,
251 COUNT(DISTINCT attributes->>'exception.type') as unique_error_types
252FROM records
253WHERE otel_status_code = 'ERROR'
254 AND attributes->>'http.route' IS NOT NULL
255GROUP BY endpoint
256ORDER BY error_count DESC
257```
258
259## Known Issues
260
261### `/tracks/` 500 Error on First Load
262
263**Trace ID:** `019a46fe0b20c24432f5a7536d8561a6`
264**Timestamp:** 2025-11-02T23:54:05.472754Z
265**Status:** 500
266
267**Symptoms:**
268- First request to `GET /tracks/` fails with 500 error
269- Subsequent requests succeed with 200 status
270- Database connection and SELECT query both execute successfully in the trace
271
272**Root Cause:**
273SSL connection pooling issue with Neon database. The error appears in `otel_status_message`:
274```
275consuming input failed: SSL connection has been closed unexpectedly
276```
277
278**Analysis:**
279- This is a Neon PostgreSQL connection pool issue where SSL connections are being dropped
280- First request attempts to use a stale/closed SSL connection from the pool
281- Subsequent requests work because the pool recovers and establishes a fresh connection
282- The error is captured in `otel_status_code: "ERROR"` and `otel_status_message` fields
283
284**Potential Fixes:**
2851. Configure SQLAlchemy connection pool settings for Neon:
286 - Set `pool_pre_ping=True` to verify connections before use
287 - Adjust `pool_recycle` to match Neon's connection timeout
2882. Review Neon-specific SSL connection settings
2893. Add retry logic for initial database connections
2904. Consider connection pool size tuning
291
292## Pre-built Dashboards
293
294### Database Performance Dashboard
295
296comprehensive database query analysis showing:
297- query performance by type and table
298- latency percentiles (p50, p95, p99)
299- error rates and counts
300- query volume and impact
301
302see `logfire-database-dashboard.sql` for the full query with alternative views for:
303- slowest individual queries
304- hourly query volume timeline
305- transaction and connection pool metrics
306
307## Resources
308
309- [Logfire SQL Explorer Documentation](https://logfire.pydantic.dev/docs/guides/web-ui/explore/)
310- [Logfire Concepts](https://logfire.pydantic.dev/docs/concepts/)
311- Logfire UI: https://logfire.pydantic.dev/zzstoatzz/plyr (project name configured in logfire.configure)