# ConnectRPC API Specification ## Overview This document specifies the implementation of a ConnectRPC API for OpenStatus server. ConnectRPC will be used for **new features only** while the existing REST API remains for current functionality. ## Architecture Decisions ### Transport & Protocol - **Protocol**: Connect protocol only (HTTP/1.1 compatible) - **Streaming**: Unary calls only (request-response, no streaming) - **Mounting**: Same port as REST, mounted at `/rpc/*` path prefix on the existing Hono app ### Schema Management - **Approach**: Schema-first with `.proto` files - **Tooling**: Buf (buf.yaml, buf.gen.yaml) - **Location**: `packages/proto` (shared package for monorepo consumption) - **Package naming**: `openstatus..v1` (e.g., `openstatus.monitor.v1`) ### Code Generation Targets - TypeScript (`@bufbuild/protobuf` + `@connectrpc/connect`) - Go (for potential backend service consumers) --- ## Authentication & Authorization ### Supported Methods Both authentication methods resolve to the same workspace context: 1. **API Key** (existing system) - Header: `x-openstatus-key` - Formats: `os_[32-char-hex]` (custom) or Unkey keys - Super admin: `sa_` prefix ### Workspace Context - Workspace ID inferred from authenticated credentials - **Super-admin override**: Tokens with `sa_` prefix can specify target workspace via `x-workspace-id` metadata header --- ## Error Handling ### Error Model Use ConnectRPC error codes with Google ErrorInfo for structured details: ```protobuf // Error codes used: NOT_FOUND, INVALID_ARGUMENT, PERMISSION_DENIED, // UNAUTHENTICATED, RESOURCE_EXHAUSTED, INTERNAL, UNAVAILABLE // Include ErrorInfo details: // - domain: "openstatus.com" // - reason: Machine-readable error reason (e.g., "MONITOR_NOT_FOUND") // - metadata: Additional context (requestId, resourceId, etc.) ``` ### Mapping to Existing Errors Reuse `OpenStatusApiError` codes, map to ConnectRPC equivalents in interceptor. --- ## First Service: Monitor Management ### Service Definition ```protobuf syntax = "proto3"; package openstatus.monitor.v1; import "buf/validate/validate.proto"; import "google/protobuf/timestamp.proto"; // MonitorService provides CRUD and operational commands for monitors. service MonitorService { // CreateMonitor creates a new monitor in the workspace. rpc CreateMonitor(CreateMonitorRequest) returns (CreateMonitorResponse); // GetMonitor retrieves a single monitor by ID. rpc GetMonitor(GetMonitorRequest) returns (GetMonitorResponse); // ListMonitors returns a paginated list of monitors. rpc ListMonitors(ListMonitorsRequest) returns (ListMonitorsResponse); // DeleteMonitor removes a monitor. rpc DeleteMonitor(DeleteMonitorRequest) returns (DeleteMonitorResponse); // TriggerMonitor initiates an immediate check. rpc TriggerMonitor(TriggerMonitorRequest) returns (TriggerMonitorResponse); } ``` ### Monitor Type Modeling Separate message types for each monitor kind: ```protobuf // HttpMonitor configuration for HTTP/HTTPS endpoint monitoring. message HttpMonitor { // The URL to monitor (required). string url = 1 [(buf.validate.field).string.uri = true]; // HTTP method to use. HttpMethod method = 2; // Request headers to include. map headers = 3; // Request body for POST/PUT/PATCH. optional string body = 4; // Timeout in milliseconds (default: 30000). int32 timeout_ms = 5 [(buf.validate.field).int32 = {gte: 1000, lte: 60000}]; // Assertions to validate the response. repeated HttpAssertion assertions = 6; } // TcpMonitor configuration for TCP connection monitoring. message TcpMonitor { // Host to connect to (required). string host = 1 [(buf.validate.field).string.min_len = 1]; // Port number (required). int32 port = 2 [(buf.validate.field).int32 = {gte: 1, lte: 65535}]; // Timeout in milliseconds. int32 timeout_ms = 3; } // DnsMonitor configuration for DNS record monitoring. message DnsMonitor { // Domain name to query (required). string domain = 1 [(buf.validate.field).string.hostname = true]; // DNS record type to check. DnsRecordType record_type = 2; // Expected values for the record. repeated string expected_values = 3; } ``` ### Pagination Offset-based pagination for list operations (page_token is the numeric offset): ```protobuf message ListMonitorsRequest { // Maximum number of monitors to return (default: 50, max: 100). int32 page_size = 1 [(buf.validate.field).int32 = {gte: 1, lte: 100}]; // Token from previous response for pagination. optional string page_token = 2; // Filter by monitor status. optional MonitorStatus status_filter = 3; // Filter by monitor type. optional MonitorType type_filter = 4; } message ListMonitorsResponse { // The monitors in this page. repeated Monitor monitors = 1; // Token for retrieving the next page, empty if no more results. string next_page_token = 2; // Total count of monitors matching the filter. int32 total_count = 3; } ``` --- ## Validation ### Approach Use **protovalidate** (Buf ecosystem) for request validation: - Validation rules defined in proto annotations - Runs before handler via interceptor - Returns `INVALID_ARGUMENT` with field-level details on failure ### Example Annotations ```protobuf message CreateMonitorRequest { string name = 1 [(buf.validate.field).string = {min_len: 1, max_len: 256}]; string description = 2 [(buf.validate.field).string.max_len = 1024]; int32 periodicity = 3 [(buf.validate.field).int32 = {in: [60, 300, 600, 1800, 3600]}]; repeated string regions = 4 [(buf.validate.field).repeated = {min_items: 1, max_items: 35}]; } ``` --- ## Code Organization ### Shared Service Layer Both REST and RPC handlers call the same business logic: ``` packages/proto/ # Shared proto definitions ├── buf.yaml # Buf configuration ├── buf.gen.yaml # Code generation config ├── openstatus/ │ └── monitor/ │ └── v1/ │ ├── monitor.proto # Message definitions │ └── service.proto # Service definition └── gen/ # Generated code ├── ts/ # TypeScript output └── go/ # Go output apps/server/src/ ├── services/ # Shared business logic (NEW) │ └── monitor/ │ ├── create.ts │ ├── get.ts │ ├── list.ts │ ├── update.ts │ ├── delete.ts │ └── operations.ts # trigger, pause, resume ├── routes/ │ └── v1/ # REST handlers (existing) │ └── monitors/ └── rpc/ # ConnectRPC handlers (NEW) ├── index.ts # Mount point ├── interceptors/ │ ├── auth.ts # Auth interceptor │ ├── logging.ts # Request logging │ └── error.ts # Error mapping └── handlers/ └── monitor.ts # MonitorService implementation ``` ### Handler Pattern ```typescript // apps/server/src/rpc/handlers/monitor.ts import type { ConnectRouter } from "@connectrpc/connect"; import { MonitorService } from "@openstatus/proto/gen/ts/openstatus/monitor/v1/service_connect"; import * as monitorService from "../../services/monitor"; export default (router: ConnectRouter) => router.service(MonitorService, { async createMonitor(req, ctx) { const workspace = ctx.values.get(workspaceKey); return await monitorService.create(workspace, req); }, // ... other methods }); ``` --- ## Interceptors ### Authentication Interceptor ```typescript // Extracts and validates auth from headers // Sets workspace context for downstream handlers // Supports both API key and Bearer token ``` ### Logging Interceptor ```typescript // Integrates with existing LogTape setup // Logs: method, duration, status, workspace, requestId ``` ### Error Interceptor ```typescript // Maps internal errors to ConnectRPC codes // Attaches ErrorInfo details // Reports to Sentry (filtered for client errors) ``` --- ## Observability ### Logging - Integrate with existing LogTape JSON logging - Log fields: `rpc.method`, `rpc.status_code`, `duration_ms`, `workspace_id`, `request_id` ### Error Tracking - Sentry integration via interceptor - Filter client errors (INVALID_ARGUMENT, NOT_FOUND, etc.) - Include request context in error reports --- ## Rate Limiting Use existing infrastructure: - Hono middleware / upstream proxy handles rate limiting - No RPC-specific rate limiting interceptors needed --- ## Testing Strategy ### Unit Tests - Test handlers directly with mocked service layer - Test interceptors in isolation - Test proto validation rules ### Integration Tests - Spin up real server instance - Use generated TypeScript client to make RPC calls - Test full request lifecycle including auth ### Test File Structure ``` apps/server/src/rpc/ ├── __tests__/ │ ├── handlers/ │ │ └── monitor.test.ts # Handler unit tests │ ├── interceptors/ │ │ └── auth.test.ts # Interceptor tests │ └── integration/ │ └── monitor.integration.ts # Full flow tests ``` --- ## Additional Considerations ### Health Check Endpoint Add a simple `Health` service for load balancer probes at `/rpc`: ```protobuf service HealthService { rpc Check(HealthCheckRequest) returns (HealthCheckResponse); } message HealthCheckRequest {} message HealthCheckResponse { enum ServingStatus { UNKNOWN = 0; SERVING = 1; NOT_SERVING = 2; } ServingStatus status = 1; } ``` ### Request ID Propagation - Generate `x-request-id` in logging interceptor if not present in request headers - Propagate request ID to all downstream services and log entries - Include request ID in error responses for debugging ### Go Code Generation - Defer Go codegen until there are concrete Go service consumers - Reduces maintenance burden and build complexity initially - Can be enabled later by adding Go target to `buf.gen.yaml` ### Proto Dependency Pinning - Use `buf.lock` to pin versions of: - `buf.build/bufbuild/protovalidate` - `buf.build/googleapis/googleapis` (if using google.protobuf types) - Run `buf mod update` to generate/update lock file --- ## Configuration Details ### CORS Handling - `/rpc` endpoint should inherit existing CORS configuration from Hono app - If different CORS rules needed, configure via Hono middleware before mounting RPC routes - Connect protocol uses standard HTTP methods (POST), no special CORS requirements ### Content-Type Support Enable both JSON and binary formats for flexibility: - `application/json` - Human-readable, easier debugging, slightly larger payloads - `application/proto` - Binary format, smaller payloads, better performance - Connect clients auto-negotiate based on `Content-Type` header ### Deadline/Timeout Propagation - Client-specified timeouts via `connect-timeout-ms` header - Server interceptor should: - Read timeout from request metadata - Create context with deadline - Cancel operations if deadline exceeded - Return `DEADLINE_EXCEEDED` error code on timeout --- ## Dependencies ### New Packages (packages/proto) ```json { "devDependencies": { "@bufbuild/buf": "latest", "@bufbuild/protoc-gen-es": "latest", "@connectrpc/protoc-gen-connect-es": "latest" }, "dependencies": { "@bufbuild/protobuf": "^2.0.0", "@bufbuild/protobuf-conformance": "^2.0.0" } } ``` ### Server App Additions ```json { "dependencies": { "@connectrpc/connect": "^2.0.0", "@connectrpc/connect-node": "^2.0.0", "@bufbuild/protovalidate": "^0.3.0", "@openstatus/proto": "workspace:*" } } ``` --- ## Migration & Rollout ### Phase 1: Foundation 1. Create `packages/proto` with Buf setup 2. Define monitor service proto 3. Generate TypeScript and Go clients 4. Add protovalidate annotations ### Phase 2: Server Integration 1. Add ConnectRPC dependencies to server 2. Implement interceptors (auth, logging, error) 3. Mount RPC routes at `/rpc` on Hono app 4. Extract shared service layer from REST handlers ### Phase 3: Handler Implementation 1. Implement MonitorService handlers 2. Write unit tests 3. Write integration tests 4. Internal testing ### Phase 4: Release 1. Documentation 2. Client SDK examples 3. Gradual rollout via feature flag (optional) --- ## Open Questions (Resolved) | Question | Decision | |----------|----------| | REST replacement or parallel? | New features only | | Transport protocol | Connect protocol only | | Streaming | Unary only | | Schema approach | Schema-first (.proto) | | Auth mechanism | Both API key + JWT | | Proto location | Shared package | | Tooling | Buf | | Error details | With ErrorInfo | | Code sharing | Shared service layer | | Client targets | TypeScript + Go | | Validation | protovalidate | | Type modeling | Separate messages | | Port strategy | Same port, /rpc prefix | | Pagination | Offset-based | | Rate limiting | Existing infrastructure | | Operations style | Separate methods | | Observability | Sentry + LogTape | | Testing | Unit + Integration | | Health check | Yes, HealthService | | Request ID | Generated + propagated | | Go codegen | Deferred | | CORS | Inherit from Hono | | Content-Type | JSON + Binary | | Timeouts | connect-timeout-ms header | --- ## References - [ConnectRPC Documentation](https://connectrpc.com/docs) - [Buf Documentation](https://buf.build/docs) - [protovalidate](https://github.com/bufbuild/protovalidate) - [Google Error Model](https://cloud.google.com/apis/design/errors) ## Future work - Implement additional services and procedure: // PauseMonitor suspends monitoring. rpc PauseMonitor(PauseMonitorRequest) returns (PauseMonitorResponse); // ResumeMonitor resumes a paused monitor. rpc ResumeMonitor(ResumeMonitorRequest) returns (ResumeMonitorResponse); // UpdateMonitor modifies an existing monitor. rpc UpdateMonitor(UpdateMonitorRequest) returns (UpdateMonitorResponse);