ATB-13: Backfill and Repo Sync Design#
Date: 2026-02-22 Linear: ATB-13 Status: Approved
Problem#
The AppView must handle restarts gracefully. Short downtimes (<1 hour) recover via cursor resume on the firehose. Longer downtimes (>48 hours) risk exceeding the ~72-hour firehose retention window, causing permanent data loss. First-time startups with an empty database see no historical data at all.
Design Decisions#
| Decision | Choice | Rationale |
|---|---|---|
| First-startup scope | Forum PDS only | Empty DB has no known user DIDs; firehose discovers users going forward |
| Firehose during backfill | Blocked | Simpler than concurrent — no deduplication needed |
| Progress tracking | DB table with resume | Survives crashes; operator visibility via admin API |
| Admin API style | Async with polling | Backfills can take minutes; synchronous would timeout |
| Indexing approach | Reuse existing Indexer handlers | Zero logic duplication; well-tested FK resolution, ban enforcement, soft deletes |
Architecture#
BackfillManager Class#
New class at apps/appview/src/lib/backfill-manager.ts. Injected into AppContext.
Core methods:
checkIfNeeded(cursor: bigint | null): Promise<BackfillStatus>— gap detectionperformBackfill(): Promise<BackfillResult>— orchestrates full syncsyncRepoRecords(did: string, collection: string): Promise<SyncStats>— syncs one (DID, collection) pair
Gap Detection#
BackfillStatus = NotNeeded | CatchUp | FullSync
Decision logic:
- No cursor →
FullSync - Cursor exists, forums table empty →
FullSync(DB inconsistency) - Cursor age > 48 hours →
CatchUp - Otherwise →
NotNeeded
The 48-hour threshold (configurable via BACKFILL_CURSOR_MAX_AGE_HOURS) provides safety margin below the ~72-hour firehose retention window.
Repo Sync Mechanism#
Uses com.atproto.repo.listRecords() (collection-based sync, not full CAR files).
For each (DID, collection) pair:
- Paginate through
listRecords({ repo: did, collection, limit: 100 }) - Transform each record to
CommitCreateEventshape (~10-line adapter) - Call matching
indexer.handleXCreate(event)— reuses all existing logic - Track success/error counts
Event shape adapter:
function toCreateEvent(did: string, record: ListRecordItem): CommitCreateEvent {
const rkey = record.uri.split("/").pop()!;
return {
did,
commit: { rkey, cid: record.cid },
record: record.value,
};
}
Collection sync order (respects FK dependencies):
space.atbb.forum.forum(no deps)space.atbb.forum.category(FK to forum)space.atbb.forum.board(FK to category)space.atbb.forum.role(FK to forum)space.atbb.membership(FK to forum, user)space.atbb.post(FK to board, user, optionally parent post)space.atbb.mod_action(FK to forum)space.atbb.reaction(FK to post, user — stub)
Rate limiting: Delay-based throttle at 1000 / BACKFILL_RATE_LIMIT ms between page fetches. Default: 10 req/s per PDS.
Conflict Resolution#
UNIQUE(did, rkey)constraint handles duplicates naturally via Indexer's upsert logic- CID comparison: if CID differs from indexed record, the Indexer updates it
- Tombstone/deletion detection via
listRecordsnot possible — deferred to post-MVP (would requirecom.atproto.sync.getRepoCAR parsing)
Backfill Orchestration#
FullSync flow:
- Sync Forum DID across all forum-owned collections (in dependency order)
- Mark backfill completed
- Return stats
CatchUp flow:
- Sync Forum DID first (structure may have changed)
- Query
userstable for all known DIDs, sorted bydid ASC - Process DIDs in batches of
BACKFILL_CONCURRENCY(default: 10) - For each DID: sync membership, posts (user-owned collections)
- Update
backfill_progressrow every batch - Mark completed; return stats
Resume from checkpoint:
On startup, check for backfill_progress row with status = 'in_progress'. If found, resume from last_processed_did by skipping alphabetically earlier DIDs.
Database Tables#
CREATE TABLE backfill_progress (
id SERIAL PRIMARY KEY,
status VARCHAR(20) NOT NULL, -- 'in_progress', 'completed', 'failed'
backfill_type VARCHAR(20) NOT NULL, -- 'full_sync', 'catch_up'
last_processed_did VARCHAR(255),
dids_total INTEGER DEFAULT 0,
dids_processed INTEGER DEFAULT 0,
records_indexed INTEGER DEFAULT 0,
started_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
completed_at TIMESTAMP WITH TIME ZONE,
error_message TEXT
);
CREATE TABLE backfill_errors (
id SERIAL PRIMARY KEY,
backfill_id INTEGER NOT NULL REFERENCES backfill_progress(id),
did VARCHAR(255) NOT NULL,
collection VARCHAR(255) NOT NULL,
error_message TEXT NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
Firehose Integration#
Modified FirehoseService.start():
- Check for interrupted backfill (resume if found)
- Load cursor, run
backfillManager.checkIfNeeded(cursor) - If backfill needed: set
isBackfilling = true, await backfill, clear flag - Proceed with existing cursor resume + jetstream start
Guard: reject start() calls while isBackfilling === true.
Admin API#
| Method | Path | Permission | Description |
|---|---|---|---|
POST |
/api/admin/backfill |
manageForum |
Trigger manual backfill; returns { backfillId, status } |
GET |
/api/admin/backfill/:id |
manageForum |
Poll progress + error count |
GET |
/api/admin/backfill/:id/errors |
manageForum |
List errors for a backfill run |
POST behavior: check isBackfilling (409 if busy), determine type via checkIfNeeded(), allow ?force=catch_up|full_sync override, kick off async, return immediately.
Error Handling#
- PDS unreachable: Log warning, insert
backfill_errorsrow, continue to next DID - Record parse failure: Log with AT URI, continue to next record
- Programming errors: Re-throw (TypeError, ReferenceError, SyntaxError)
- Partial completion: Status set to
completed, errors queryable via admin API
Configuration#
| Variable | Default | Description |
|---|---|---|
BACKFILL_RATE_LIMIT |
10 |
Max XRPC requests/second per PDS |
BACKFILL_CONCURRENCY |
10 |
Max DIDs processed concurrently |
BACKFILL_CURSOR_MAX_AGE_HOURS |
48 |
Cursor age threshold for CatchUp |
Files#
| Action | File |
|---|---|
| Create | apps/appview/src/lib/backfill-manager.ts |
| Create | apps/appview/src/lib/__tests__/backfill-manager.test.ts |
| Create | apps/appview/src/lib/__tests__/backfill-integration.test.ts |
| Create | packages/db/drizzle/migrations/XXXX_add_backfill_tables.sql |
| Modify | packages/db/src/schema.ts — add backfill tables |
| Modify | apps/appview/src/lib/firehose.ts — backfill check in start() |
| Modify | apps/appview/src/lib/cursor-manager.ts — add getCursorAge() |
| Modify | apps/appview/src/lib/app-context.ts — add backfillManager |
| Modify | apps/appview/src/lib/config.ts — add backfill config fields |
| Modify | apps/appview/src/routes/admin.ts — add backfill endpoints |
| Modify | apps/appview/src/index.ts — wire BackfillManager into startup |
| Modify | turbo.json — add backfill env vars |
Testing#
Unit tests: Gap detection (all 4 scenarios), syncRepoRecords pagination, event shape transformation, rate limiting, conflict resolution, resume logic, progress updates.
Integration tests: FullSync with mock PDS, CatchUp with known users, interrupted resume, partial PDS failure, admin endpoint trigger/poll, firehose blocked during backfill.
Mocking: Mock AtpAgent.com.atproto.repo.listRecords for controlled responses. Mock Indexer methods to verify event shapes. Real test DB for progress tracking.