QuickDID is a high-performance AT Protocol identity resolution service written in Rust. It provides handle-to-DID resolution with Redis-backed caching and queue processing.

documentation: guide to using telegraf and timescaledb for metrics

+3
.gitignore
··· 7 8 # Docker 9 docker-compose.override.yml
··· 7 8 # Docker 9 docker-compose.override.yml 10 + 11 + # Metrics dev 12 + /metrics-stack
+426
docs/telegraf-timescaledb-metrics-guide.md
···
··· 1 + # Telegraf and TimescaleDB Metrics Collection Guide 2 + 3 + This guide demonstrates how to set up a metrics collection pipeline using Telegraf to collect StatsD metrics and store them in PostgreSQL with TimescaleDB using Docker Compose. 4 + 5 + ## Overview 6 + 7 + This setup creates a metrics pipeline that: 8 + - Collects StatsD metrics via Telegraf on UDP port 8125 9 + - Stores metrics in PostgreSQL with TimescaleDB extensions 10 + - Automatically creates hypertables for time-series optimization 11 + - Provides a complete Docker Compose configuration for easy deployment 12 + 13 + ## Prerequisites 14 + 15 + - Docker and Docker Compose installed 16 + - Basic understanding of StatsD metrics format 17 + - Familiarity with PostgreSQL/TimescaleDB concepts 18 + 19 + ## Project Structure 20 + 21 + Create the following directory structure: 22 + 23 + ``` 24 + metrics-stack/ 25 + ├── docker-compose.yml 26 + ├── telegraf/ 27 + │ └── telegraf.conf 28 + └── .env 29 + ``` 30 + 31 + ## Configuration Files 32 + 33 + ### 1. Environment Variables (.env) 34 + 35 + Create a `.env` file to store sensitive configuration: 36 + 37 + ```env 38 + # PostgreSQL/TimescaleDB Configuration 39 + POSTGRES_DB=metrics 40 + POSTGRES_USER=postgres 41 + POSTGRES_PASSWORD=secretpassword 42 + 43 + # Telegraf Database User 44 + TELEGRAF_DB_USER=postgres 45 + TELEGRAF_DB_PASSWORD=secretpassword 46 + 47 + # TimescaleDB Settings 48 + TIMESCALE_TELEMETRY=off 49 + ``` 50 + 51 + ### 2. Telegraf Configuration (telegraf/telegraf.conf) 52 + 53 + Create the Telegraf configuration file: 54 + 55 + ```toml 56 + # Global Telegraf Agent Configuration 57 + [agent] 58 + interval = "10s" 59 + round_interval = true 60 + metric_batch_size = 1000 61 + metric_buffer_limit = 10000 62 + collection_jitter = "0s" 63 + flush_interval = "10s" 64 + flush_jitter = "0s" 65 + precision = "" 66 + debug = false 67 + quiet = false 68 + hostname = "telegraf-agent" 69 + omit_hostname = false 70 + 71 + # StatsD Input Plugin 72 + [[inputs.statsd]] 73 + service_address = ":8125" # Listen on UDP port 8125 for StatsD metrics 74 + protocol = "udp" 75 + delete_gauges = true 76 + delete_counters = true 77 + delete_sets = true 78 + delete_timings = true 79 + percentiles = [50, 90, 95, 99] 80 + metric_separator = "." 81 + allowed_pending_messages = 10000 82 + datadog_extensions = true 83 + datadog_distributions = true 84 + 85 + # PostgreSQL (TimescaleDB) Output Plugin 86 + [[outputs.postgresql]] 87 + connection = "host=timescaledb user=${TELEGRAF_DB_USER} password=${TELEGRAF_DB_PASSWORD} dbname=${POSTGRES_DB} sslmode=disable" 88 + schema = "public" 89 + create_templates = [ 90 + '''CREATE TABLE IF NOT EXISTS {{.table}} ({{.columns}})''', 91 + '''SELECT create_hypertable({{.table|quoteLiteral}}, 'time', if_not_exists => TRUE)''', 92 + ] 93 + tags_as_jsonb = true 94 + 95 + ``` 96 + 97 + ### 3. Docker Compose Configuration (docker-compose.yml) 98 + 99 + Create the Docker Compose file: 100 + 101 + ```yaml 102 + version: '3.8' 103 + 104 + services: 105 + timescaledb: 106 + image: timescale/timescaledb-ha:pg17 107 + container_name: timescaledb 108 + restart: unless-stopped 109 + environment: 110 + POSTGRES_DB: ${POSTGRES_DB} 111 + POSTGRES_USER: ${POSTGRES_USER} 112 + POSTGRES_PASSWORD: ${POSTGRES_PASSWORD} 113 + TIMESCALE_TELEMETRY: ${TIMESCALE_TELEMETRY} 114 + ports: 115 + - "5442:5432" 116 + volumes: 117 + - timescale_data:/home/postgres/pgdata/data 118 + - ./init-scripts:/docker-entrypoint-initdb.d:ro 119 + healthcheck: 120 + test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"] 121 + interval: 10s 122 + timeout: 5s 123 + retries: 5 124 + networks: 125 + - metrics_network 126 + 127 + telegraf: 128 + image: telegraf 129 + container_name: telegraf 130 + restart: unless-stopped 131 + environment: 132 + TELEGRAF_DB_USER: ${TELEGRAF_DB_USER} 133 + TELEGRAF_DB_PASSWORD: ${TELEGRAF_DB_PASSWORD} 134 + POSTGRES_DB: ${POSTGRES_DB} 135 + ports: 136 + - "8125:8125/udp" # StatsD UDP port 137 + volumes: 138 + - ./telegraf/telegraf.conf:/etc/telegraf/telegraf.conf:ro 139 + depends_on: 140 + timescaledb: 141 + condition: service_healthy 142 + networks: 143 + - metrics_network 144 + command: ["telegraf", "--config", "/etc/telegraf/telegraf.conf"] 145 + 146 + redis: 147 + image: redis:7-alpine 148 + container_name: redis 149 + restart: unless-stopped 150 + ports: 151 + - "6379:6379" 152 + volumes: 153 + - redis_data:/data 154 + command: redis-server --appendonly yes --appendfsync everysec 155 + healthcheck: 156 + test: ["CMD", "redis-cli", "ping"] 157 + interval: 10s 158 + timeout: 5s 159 + retries: 5 160 + networks: 161 + - metrics_network 162 + 163 + networks: 164 + metrics_network: 165 + driver: bridge 166 + 167 + volumes: 168 + timescale_data: 169 + redis_data: 170 + ``` 171 + 172 + ### 4. Database Initialization Script (optional) 173 + 174 + Create `init-scripts/01-init.sql` to set up the Telegraf user and schema: 175 + 176 + ```sql 177 + -- Enable TimescaleDB extension 178 + CREATE EXTENSION IF NOT EXISTS timescaledb; 179 + ``` 180 + 181 + ## Usage 182 + 183 + ### Starting the Stack 184 + 185 + 1. Navigate to your project directory: 186 + ```bash 187 + cd metrics-stack 188 + ``` 189 + 190 + 2. Start the services: 191 + ```bash 192 + docker-compose up -d 193 + ``` 194 + 195 + 3. Check the logs to ensure everything is running: 196 + ```bash 197 + docker-compose logs -f 198 + ``` 199 + 200 + ### Sending Metrics 201 + 202 + Send test metrics to Telegraf using StatsD protocol: 203 + 204 + ```bash 205 + # Send a counter metric 206 + echo "quickdid.http.request.count:1|c|#method:GET,path:/resolve,status:200" | nc -u -w0 localhost 8125 207 + 208 + # Send a gauge metric 209 + echo "quickdid.resolver.rate_limit.available_permits:10|g" | nc -u -w0 localhost 8125 210 + 211 + # Send a timing metric 212 + echo "quickdid.http.request.duration_ms:42|ms|#method:POST,path:/api,status:201" | nc -u -w0 localhost 8125 213 + 214 + # Send a histogram 215 + echo "quickdid.resolver.resolution_time:100|h|#resolver:redis" | nc -u -w0 localhost 8125 216 + ``` 217 + 218 + ### Querying Metrics 219 + 220 + Connect to TimescaleDB to query your metrics: 221 + 222 + ```bash 223 + # Connect to the database 224 + docker exec -it timescaledb psql -U postgres -d metrics 225 + 226 + # Query recent metrics 227 + SELECT 228 + time, 229 + tags, 230 + value 231 + FROM "quickdid.http.request.count" 232 + WHERE time > NOW() - INTERVAL '1 hour' 233 + ORDER BY time DESC 234 + LIMIT 10; 235 + 236 + # Query specific metric with tags 237 + SELECT 238 + time_bucket('1 minute', time) AS minute, 239 + tags->>'method' as method, 240 + tags->>'path' as path, 241 + tags->>'status' as status, 242 + AVG(mean::numeric) as avg_duration 243 + FROM "quickdid.http.request.duration_ms" 244 + WHERE 245 + time > NOW() - INTERVAL '1 hour' 246 + GROUP BY minute, method, path, status 247 + ORDER BY minute DESC; 248 + ``` 249 + 250 + ## Advanced Configuration 251 + 252 + ### Continuous Aggregates 253 + 254 + Create continuous aggregates for better query performance: 255 + 256 + ```sql 257 + -- Create a continuous aggregate for HTTP metrics 258 + CREATE MATERIALIZED VIEW http_metrics_hourly 259 + WITH (timescaledb.continuous) AS 260 + SELECT 261 + time_bucket('1 hour', time) AS hour, 262 + name, 263 + tags->>'method' as method, 264 + tags->>'path' as path, 265 + tags->>'status' as status, 266 + COUNT(*) as request_count, 267 + AVG((fields->>'value')::numeric) as avg_value, 268 + MAX((fields->>'value')::numeric) as max_value, 269 + MIN((fields->>'value')::numeric) as min_value 270 + FROM telegraf.metrics 271 + WHERE name LIKE 'quickdid_http_%' 272 + GROUP BY hour, name, method, path, status 273 + WITH NO DATA; 274 + 275 + -- Add refresh policy 276 + SELECT add_continuous_aggregate_policy('http_metrics_hourly', 277 + start_offset => INTERVAL '3 hours', 278 + end_offset => INTERVAL '1 hour', 279 + schedule_interval => INTERVAL '1 hour'); 280 + ``` 281 + 282 + ## Monitoring and Maintenance 283 + 284 + ### Health Checks 285 + 286 + Check the health of your TimescaleDB hypertables: 287 + 288 + ```sql 289 + -- View hypertable information 290 + SELECT * FROM timescaledb_information.hypertables; 291 + 292 + -- Check chunk information 293 + SELECT 294 + hypertable_name, 295 + chunk_name, 296 + range_start, 297 + range_end, 298 + is_compressed 299 + FROM timescaledb_information.chunks 300 + WHERE hypertable_name = 'quickdid.http.request.duration_ms' 301 + ORDER BY range_start DESC 302 + LIMIT 10; 303 + 304 + -- Check compression status 305 + SELECT 306 + hypertable_name, 307 + uncompressed_total_bytes, 308 + compressed_total_bytes, 309 + compression_ratio 310 + FROM timescaledb_information.hypertable_compression_stats; 311 + ``` 312 + 313 + ### Troubleshooting 314 + 315 + 1. **Telegraf can't connect to TimescaleDB:** 316 + - Check that TimescaleDB is healthy: `docker-compose ps` 317 + - Verify credentials in `.env` file 318 + - Check network connectivity between containers 319 + 320 + 2. **Metrics not appearing in database:** 321 + - Check Telegraf logs: `docker-compose logs telegraf` 322 + - Verify StatsD metrics are being received: `docker-compose logs telegraf | grep statsd` 323 + - Check table creation: Connect to database and run `\dt telegraf.*` 324 + 325 + 3. **Performance issues:** 326 + - Check chunk sizes and adjust chunk_time_interval if needed 327 + - Verify compression is working 328 + - Consider creating appropriate indexes on frequently queried columns 329 + 330 + ## Integration with QuickDID 331 + 332 + To integrate with QuickDID, configure it to send metrics to the Telegraf StatsD endpoint: 333 + 334 + ```bash 335 + # Set environment variables for QuickDID 336 + export METRICS_ADAPTER=statsd 337 + export METRICS_STATSD_HOST=localhost:8125 338 + export METRICS_PREFIX=quickdid. 339 + export METRICS_TAGS=env:production,service:quickdid 340 + 341 + # Start QuickDID 342 + cargo run 343 + ``` 344 + 345 + QuickDID will automatically send metrics to Telegraf, which will store them in TimescaleDB for analysis. 346 + 347 + ## Backup and Restore 348 + 349 + ### Backup 350 + 351 + Create a backup of your metrics data: 352 + 353 + ```bash 354 + # Backup entire database 355 + docker exec timescaledb pg_dump -U postgres metrics_db > metrics_backup.sql 356 + 357 + # Backup specific time range 358 + docker exec timescaledb pg_dump -U postgres \ 359 + --table='telegraf.metrics' \ 360 + --data-only \ 361 + --where="time >= '2024-01-01' AND time < '2024-02-01'" \ 362 + metrics_db > metrics_january.sql 363 + ``` 364 + 365 + ### Restore 366 + 367 + Restore from backup: 368 + 369 + ```bash 370 + # Restore database 371 + docker exec -i timescaledb psql -U postgres metrics_db < metrics_backup.sql 372 + ``` 373 + 374 + ## Security Considerations 375 + 376 + 1. **Use strong passwords:** Update the default passwords in `.env` 377 + 2. **Enable SSL:** Configure `sslmode=require` in production 378 + 3. **Network isolation:** Use Docker networks to isolate services 379 + 4. **Access control:** Limit database user permissions to minimum required 380 + 5. **Regular updates:** Keep Docker images updated for security patches 381 + 382 + ## Performance Tuning 383 + 384 + ### TimescaleDB Configuration 385 + 386 + Add these settings to optimize performance: 387 + 388 + ```yaml 389 + # In docker-compose.yml under timescaledb service 390 + command: 391 + - postgres 392 + - -c 393 + - shared_buffers=1GB 394 + - -c 395 + - effective_cache_size=3GB 396 + - -c 397 + - maintenance_work_mem=512MB 398 + - -c 399 + - work_mem=32MB 400 + - -c 401 + - timescaledb.max_background_workers=8 402 + - -c 403 + - max_parallel_workers_per_gather=2 404 + - -c 405 + - max_parallel_workers=8 406 + ``` 407 + 408 + ### Telegraf Buffer Settings 409 + 410 + Adjust buffer settings based on your metric volume: 411 + 412 + ```toml 413 + [agent] 414 + metric_batch_size = 5000 # Increase for high volume 415 + metric_buffer_limit = 100000 # Increase buffer size 416 + flush_interval = "5s" # Decrease for more frequent writes 417 + ``` 418 + 419 + ## Conclusion 420 + 421 + This setup provides a robust metrics collection and storage solution using: 422 + - **Telegraf** for flexible metric collection via StatsD 423 + - **TimescaleDB** for efficient time-series data storage 424 + - **Docker Compose** for easy deployment and management 425 + 426 + The system is production-ready with support for compression, retention policies, and continuous aggregates for optimal performance at scale.