Openstatus www.openstatus.dev
at main 504 lines 14 kB view raw view rendered
1# ConnectRPC API Specification 2 3## Overview 4 5This document specifies the implementation of a ConnectRPC API for OpenStatus server. ConnectRPC will be used for **new features only** while the existing REST API remains for current functionality. 6 7## Architecture Decisions 8 9### Transport & Protocol 10- **Protocol**: Connect protocol only (HTTP/1.1 compatible) 11- **Streaming**: Unary calls only (request-response, no streaming) 12- **Mounting**: Same port as REST, mounted at `/rpc/*` path prefix on the existing Hono app 13 14### Schema Management 15- **Approach**: Schema-first with `.proto` files 16- **Tooling**: Buf (buf.yaml, buf.gen.yaml) 17- **Location**: `packages/proto` (shared package for monorepo consumption) 18- **Package naming**: `openstatus.<domain>.v1` (e.g., `openstatus.monitor.v1`) 19 20### Code Generation Targets 21- TypeScript (`@bufbuild/protobuf` + `@connectrpc/connect`) 22- Go (for potential backend service consumers) 23 24--- 25 26## Authentication & Authorization 27 28### Supported Methods 29Both authentication methods resolve to the same workspace context: 30 311. **API Key** (existing system) 32 - Header: `x-openstatus-key` 33 - Formats: `os_[32-char-hex]` (custom) or Unkey keys 34 - Super admin: `sa_` prefix 35 36 37### Workspace Context 38- Workspace ID inferred from authenticated credentials 39- **Super-admin override**: Tokens with `sa_` prefix can specify target workspace via `x-workspace-id` metadata header 40 41--- 42 43## Error Handling 44 45### Error Model 46Use ConnectRPC error codes with Google ErrorInfo for structured details: 47 48```protobuf 49// Error codes used: NOT_FOUND, INVALID_ARGUMENT, PERMISSION_DENIED, 50// UNAUTHENTICATED, RESOURCE_EXHAUSTED, INTERNAL, UNAVAILABLE 51 52// Include ErrorInfo details: 53// - domain: "openstatus.com" 54// - reason: Machine-readable error reason (e.g., "MONITOR_NOT_FOUND") 55// - metadata: Additional context (requestId, resourceId, etc.) 56``` 57 58### Mapping to Existing Errors 59Reuse `OpenStatusApiError` codes, map to ConnectRPC equivalents in interceptor. 60 61--- 62 63## First Service: Monitor Management 64 65### Service Definition 66 67```protobuf 68syntax = "proto3"; 69 70package openstatus.monitor.v1; 71 72import "buf/validate/validate.proto"; 73import "google/protobuf/timestamp.proto"; 74 75// MonitorService provides CRUD and operational commands for monitors. 76service MonitorService { 77 // CreateMonitor creates a new monitor in the workspace. 78 rpc CreateMonitor(CreateMonitorRequest) returns (CreateMonitorResponse); 79 80 // GetMonitor retrieves a single monitor by ID. 81 rpc GetMonitor(GetMonitorRequest) returns (GetMonitorResponse); 82 83 // ListMonitors returns a paginated list of monitors. 84 rpc ListMonitors(ListMonitorsRequest) returns (ListMonitorsResponse); 85 86 // DeleteMonitor removes a monitor. 87 rpc DeleteMonitor(DeleteMonitorRequest) returns (DeleteMonitorResponse); 88 89 // TriggerMonitor initiates an immediate check. 90 rpc TriggerMonitor(TriggerMonitorRequest) returns (TriggerMonitorResponse); 91 92} 93``` 94 95### Monitor Type Modeling 96 97Separate message types for each monitor kind: 98 99```protobuf 100// HttpMonitor configuration for HTTP/HTTPS endpoint monitoring. 101message HttpMonitor { 102 // The URL to monitor (required). 103 string url = 1 [(buf.validate.field).string.uri = true]; 104 105 // HTTP method to use. 106 HttpMethod method = 2; 107 108 // Request headers to include. 109 map<string, string> headers = 3; 110 111 // Request body for POST/PUT/PATCH. 112 optional string body = 4; 113 114 // Timeout in milliseconds (default: 30000). 115 int32 timeout_ms = 5 [(buf.validate.field).int32 = {gte: 1000, lte: 60000}]; 116 117 // Assertions to validate the response. 118 repeated HttpAssertion assertions = 6; 119} 120 121// TcpMonitor configuration for TCP connection monitoring. 122message TcpMonitor { 123 // Host to connect to (required). 124 string host = 1 [(buf.validate.field).string.min_len = 1]; 125 126 // Port number (required). 127 int32 port = 2 [(buf.validate.field).int32 = {gte: 1, lte: 65535}]; 128 129 // Timeout in milliseconds. 130 int32 timeout_ms = 3; 131} 132 133// DnsMonitor configuration for DNS record monitoring. 134message DnsMonitor { 135 // Domain name to query (required). 136 string domain = 1 [(buf.validate.field).string.hostname = true]; 137 138 // DNS record type to check. 139 DnsRecordType record_type = 2; 140 141 // Expected values for the record. 142 repeated string expected_values = 3; 143} 144``` 145 146### Pagination 147 148Offset-based pagination for list operations (page_token is the numeric offset): 149 150```protobuf 151message ListMonitorsRequest { 152 // Maximum number of monitors to return (default: 50, max: 100). 153 int32 page_size = 1 [(buf.validate.field).int32 = {gte: 1, lte: 100}]; 154 155 // Token from previous response for pagination. 156 optional string page_token = 2; 157 158 // Filter by monitor status. 159 optional MonitorStatus status_filter = 3; 160 161 // Filter by monitor type. 162 optional MonitorType type_filter = 4; 163} 164 165message ListMonitorsResponse { 166 // The monitors in this page. 167 repeated Monitor monitors = 1; 168 169 // Token for retrieving the next page, empty if no more results. 170 string next_page_token = 2; 171 172 // Total count of monitors matching the filter. 173 int32 total_count = 3; 174} 175``` 176 177--- 178 179## Validation 180 181### Approach 182Use **protovalidate** (Buf ecosystem) for request validation: 183 184- Validation rules defined in proto annotations 185- Runs before handler via interceptor 186- Returns `INVALID_ARGUMENT` with field-level details on failure 187 188### Example Annotations 189```protobuf 190message CreateMonitorRequest { 191 string name = 1 [(buf.validate.field).string = {min_len: 1, max_len: 256}]; 192 string description = 2 [(buf.validate.field).string.max_len = 1024]; 193 int32 periodicity = 3 [(buf.validate.field).int32 = {in: [60, 300, 600, 1800, 3600]}]; 194 repeated string regions = 4 [(buf.validate.field).repeated = {min_items: 1, max_items: 35}]; 195} 196``` 197 198--- 199 200## Code Organization 201 202### Shared Service Layer 203 204Both REST and RPC handlers call the same business logic: 205 206``` 207packages/proto/ # Shared proto definitions 208├── buf.yaml # Buf configuration 209├── buf.gen.yaml # Code generation config 210├── openstatus/ 211│ └── monitor/ 212│ └── v1/ 213│ ├── monitor.proto # Message definitions 214│ └── service.proto # Service definition 215└── gen/ # Generated code 216 ├── ts/ # TypeScript output 217 └── go/ # Go output 218 219apps/server/src/ 220├── services/ # Shared business logic (NEW) 221│ └── monitor/ 222│ ├── create.ts 223│ ├── get.ts 224│ ├── list.ts 225│ ├── update.ts 226│ ├── delete.ts 227│ └── operations.ts # trigger, pause, resume 228├── routes/ 229│ └── v1/ # REST handlers (existing) 230│ └── monitors/ 231└── rpc/ # ConnectRPC handlers (NEW) 232 ├── index.ts # Mount point 233 ├── interceptors/ 234 │ ├── auth.ts # Auth interceptor 235 │ ├── logging.ts # Request logging 236 │ └── error.ts # Error mapping 237 └── handlers/ 238 └── monitor.ts # MonitorService implementation 239``` 240 241### Handler Pattern 242 243```typescript 244// apps/server/src/rpc/handlers/monitor.ts 245import type { ConnectRouter } from "@connectrpc/connect"; 246import { MonitorService } from "@openstatus/proto/gen/ts/openstatus/monitor/v1/service_connect"; 247import * as monitorService from "../../services/monitor"; 248 249export default (router: ConnectRouter) => 250 router.service(MonitorService, { 251 async createMonitor(req, ctx) { 252 const workspace = ctx.values.get(workspaceKey); 253 return await monitorService.create(workspace, req); 254 }, 255 // ... other methods 256 }); 257``` 258 259--- 260 261## Interceptors 262 263### Authentication Interceptor 264```typescript 265// Extracts and validates auth from headers 266// Sets workspace context for downstream handlers 267// Supports both API key and Bearer token 268``` 269 270### Logging Interceptor 271```typescript 272// Integrates with existing LogTape setup 273// Logs: method, duration, status, workspace, requestId 274``` 275 276### Error Interceptor 277```typescript 278// Maps internal errors to ConnectRPC codes 279// Attaches ErrorInfo details 280// Reports to Sentry (filtered for client errors) 281``` 282 283--- 284 285## Observability 286 287### Logging 288- Integrate with existing LogTape JSON logging 289- Log fields: `rpc.method`, `rpc.status_code`, `duration_ms`, `workspace_id`, `request_id` 290 291### Error Tracking 292- Sentry integration via interceptor 293- Filter client errors (INVALID_ARGUMENT, NOT_FOUND, etc.) 294- Include request context in error reports 295 296--- 297 298## Rate Limiting 299 300Use existing infrastructure: 301- Hono middleware / upstream proxy handles rate limiting 302- No RPC-specific rate limiting interceptors needed 303 304--- 305 306## Testing Strategy 307 308### Unit Tests 309- Test handlers directly with mocked service layer 310- Test interceptors in isolation 311- Test proto validation rules 312 313### Integration Tests 314- Spin up real server instance 315- Use generated TypeScript client to make RPC calls 316- Test full request lifecycle including auth 317 318### Test File Structure 319``` 320apps/server/src/rpc/ 321├── __tests__/ 322│ ├── handlers/ 323│ │ └── monitor.test.ts # Handler unit tests 324│ ├── interceptors/ 325│ │ └── auth.test.ts # Interceptor tests 326│ └── integration/ 327│ └── monitor.integration.ts # Full flow tests 328``` 329 330--- 331 332## Additional Considerations 333 334### Health Check Endpoint 335Add a simple `Health` service for load balancer probes at `/rpc`: 336 337```protobuf 338service HealthService { 339 rpc Check(HealthCheckRequest) returns (HealthCheckResponse); 340} 341 342message HealthCheckRequest {} 343 344message HealthCheckResponse { 345 enum ServingStatus { 346 UNKNOWN = 0; 347 SERVING = 1; 348 NOT_SERVING = 2; 349 } 350 ServingStatus status = 1; 351} 352``` 353 354### Request ID Propagation 355- Generate `x-request-id` in logging interceptor if not present in request headers 356- Propagate request ID to all downstream services and log entries 357- Include request ID in error responses for debugging 358 359### Go Code Generation 360- Defer Go codegen until there are concrete Go service consumers 361- Reduces maintenance burden and build complexity initially 362- Can be enabled later by adding Go target to `buf.gen.yaml` 363 364### Proto Dependency Pinning 365- Use `buf.lock` to pin versions of: 366 - `buf.build/bufbuild/protovalidate` 367 - `buf.build/googleapis/googleapis` (if using google.protobuf types) 368- Run `buf mod update` to generate/update lock file 369 370--- 371 372## Configuration Details 373 374### CORS Handling 375- `/rpc` endpoint should inherit existing CORS configuration from Hono app 376- If different CORS rules needed, configure via Hono middleware before mounting RPC routes 377- Connect protocol uses standard HTTP methods (POST), no special CORS requirements 378 379### Content-Type Support 380Enable both JSON and binary formats for flexibility: 381- `application/json` - Human-readable, easier debugging, slightly larger payloads 382- `application/proto` - Binary format, smaller payloads, better performance 383- Connect clients auto-negotiate based on `Content-Type` header 384 385### Deadline/Timeout Propagation 386- Client-specified timeouts via `connect-timeout-ms` header 387- Server interceptor should: 388 - Read timeout from request metadata 389 - Create context with deadline 390 - Cancel operations if deadline exceeded 391 - Return `DEADLINE_EXCEEDED` error code on timeout 392 393--- 394 395## Dependencies 396 397### New Packages (packages/proto) 398```json 399{ 400 "devDependencies": { 401 "@bufbuild/buf": "latest", 402 "@bufbuild/protoc-gen-es": "latest", 403 "@connectrpc/protoc-gen-connect-es": "latest" 404 }, 405 "dependencies": { 406 "@bufbuild/protobuf": "^2.0.0", 407 "@bufbuild/protobuf-conformance": "^2.0.0" 408 } 409} 410``` 411 412### Server App Additions 413```json 414{ 415 "dependencies": { 416 "@connectrpc/connect": "^2.0.0", 417 "@connectrpc/connect-node": "^2.0.0", 418 "@bufbuild/protovalidate": "^0.3.0", 419 "@openstatus/proto": "workspace:*" 420 } 421} 422``` 423 424--- 425 426## Migration & Rollout 427 428### Phase 1: Foundation 4291. Create `packages/proto` with Buf setup 4302. Define monitor service proto 4313. Generate TypeScript and Go clients 4324. Add protovalidate annotations 433 434### Phase 2: Server Integration 4351. Add ConnectRPC dependencies to server 4362. Implement interceptors (auth, logging, error) 4373. Mount RPC routes at `/rpc` on Hono app 4384. Extract shared service layer from REST handlers 439 440### Phase 3: Handler Implementation 4411. Implement MonitorService handlers 4422. Write unit tests 4433. Write integration tests 4444. Internal testing 445 446### Phase 4: Release 4471. Documentation 4482. Client SDK examples 4493. Gradual rollout via feature flag (optional) 450 451--- 452 453## Open Questions (Resolved) 454 455| Question | Decision | 456|----------|----------| 457| REST replacement or parallel? | New features only | 458| Transport protocol | Connect protocol only | 459| Streaming | Unary only | 460| Schema approach | Schema-first (.proto) | 461| Auth mechanism | Both API key + JWT | 462| Proto location | Shared package | 463| Tooling | Buf | 464| Error details | With ErrorInfo | 465| Code sharing | Shared service layer | 466| Client targets | TypeScript + Go | 467| Validation | protovalidate | 468| Type modeling | Separate messages | 469| Port strategy | Same port, /rpc prefix | 470| Pagination | Offset-based | 471| Rate limiting | Existing infrastructure | 472| Operations style | Separate methods | 473| Observability | Sentry + LogTape | 474| Testing | Unit + Integration | 475| Health check | Yes, HealthService | 476| Request ID | Generated + propagated | 477| Go codegen | Deferred | 478| CORS | Inherit from Hono | 479| Content-Type | JSON + Binary | 480| Timeouts | connect-timeout-ms header | 481 482--- 483 484## References 485 486- [ConnectRPC Documentation](https://connectrpc.com/docs) 487- [Buf Documentation](https://buf.build/docs) 488- [protovalidate](https://github.com/bufbuild/protovalidate) 489- [Google Error Model](https://cloud.google.com/apis/design/errors) 490 491## Future work 492 493 494- Implement additional services and procedure: 495 496 // PauseMonitor suspends monitoring. 497 rpc PauseMonitor(PauseMonitorRequest) returns (PauseMonitorResponse); 498 499 // ResumeMonitor resumes a paused monitor. 500 rpc ResumeMonitor(ResumeMonitorRequest) returns (ResumeMonitorResponse); 501 502 503 // UpdateMonitor modifies an existing monitor. 504 rpc UpdateMonitor(UpdateMonitorRequest) returns (UpdateMonitorResponse);