WIP! A BB-style forum, on the ATmosphere! We're still working... we'll be back soon when we have something to show off!
node typescript hono htmx atproto
5
fork

Configure Feed

Select the types of activity you want to include in your feed.

ATB-13: Backfill and Repo Sync Design#

Date: 2026-02-22 Linear: ATB-13 Status: Approved

Problem#

The AppView must handle restarts gracefully. Short downtimes (<1 hour) recover via cursor resume on the firehose. Longer downtimes (>48 hours) risk exceeding the ~72-hour firehose retention window, causing permanent data loss. First-time startups with an empty database see no historical data at all.

Design Decisions#

Decision Choice Rationale
First-startup scope Forum PDS only Empty DB has no known user DIDs; firehose discovers users going forward
Firehose during backfill Blocked Simpler than concurrent — no deduplication needed
Progress tracking DB table with resume Survives crashes; operator visibility via admin API
Admin API style Async with polling Backfills can take minutes; synchronous would timeout
Indexing approach Reuse existing Indexer handlers Zero logic duplication; well-tested FK resolution, ban enforcement, soft deletes

Architecture#

BackfillManager Class#

New class at apps/appview/src/lib/backfill-manager.ts. Injected into AppContext.

Core methods:

  • checkIfNeeded(cursor: bigint | null): Promise<BackfillStatus> — gap detection
  • performBackfill(): Promise<BackfillResult> — orchestrates full sync
  • syncRepoRecords(did: string, collection: string): Promise<SyncStats> — syncs one (DID, collection) pair

Gap Detection#

BackfillStatus = NotNeeded | CatchUp | FullSync

Decision logic:

  1. No cursor → FullSync
  2. Cursor exists, forums table empty → FullSync (DB inconsistency)
  3. Cursor age > 48 hours → CatchUp
  4. Otherwise → NotNeeded

The 48-hour threshold (configurable via BACKFILL_CURSOR_MAX_AGE_HOURS) provides safety margin below the ~72-hour firehose retention window.

Repo Sync Mechanism#

Uses com.atproto.repo.listRecords() (collection-based sync, not full CAR files).

For each (DID, collection) pair:

  1. Paginate through listRecords({ repo: did, collection, limit: 100 })
  2. Transform each record to CommitCreateEvent shape (~10-line adapter)
  3. Call matching indexer.handleXCreate(event) — reuses all existing logic
  4. Track success/error counts

Event shape adapter:

function toCreateEvent(did: string, record: ListRecordItem): CommitCreateEvent {
  const rkey = record.uri.split("/").pop()!;
  return {
    did,
    commit: { rkey, cid: record.cid },
    record: record.value,
  };
}

Collection sync order (respects FK dependencies):

  1. space.atbb.forum.forum (no deps)
  2. space.atbb.forum.category (FK to forum)
  3. space.atbb.forum.board (FK to category)
  4. space.atbb.forum.role (FK to forum)
  5. space.atbb.membership (FK to forum, user)
  6. space.atbb.post (FK to board, user, optionally parent post)
  7. space.atbb.mod_action (FK to forum)
  8. space.atbb.reaction (FK to post, user — stub)

Rate limiting: Delay-based throttle at 1000 / BACKFILL_RATE_LIMIT ms between page fetches. Default: 10 req/s per PDS.

Conflict Resolution#

  • UNIQUE(did, rkey) constraint handles duplicates naturally via Indexer's upsert logic
  • CID comparison: if CID differs from indexed record, the Indexer updates it
  • Tombstone/deletion detection via listRecords not possible — deferred to post-MVP (would require com.atproto.sync.getRepo CAR parsing)

Backfill Orchestration#

FullSync flow:

  1. Sync Forum DID across all forum-owned collections (in dependency order)
  2. Mark backfill completed
  3. Return stats

CatchUp flow:

  1. Sync Forum DID first (structure may have changed)
  2. Query users table for all known DIDs, sorted by did ASC
  3. Process DIDs in batches of BACKFILL_CONCURRENCY (default: 10)
  4. For each DID: sync membership, posts (user-owned collections)
  5. Update backfill_progress row every batch
  6. Mark completed; return stats

Resume from checkpoint:

On startup, check for backfill_progress row with status = 'in_progress'. If found, resume from last_processed_did by skipping alphabetically earlier DIDs.

Database Tables#

CREATE TABLE backfill_progress (
  id              SERIAL PRIMARY KEY,
  status          VARCHAR(20) NOT NULL,   -- 'in_progress', 'completed', 'failed'
  backfill_type   VARCHAR(20) NOT NULL,   -- 'full_sync', 'catch_up'
  last_processed_did VARCHAR(255),
  dids_total      INTEGER DEFAULT 0,
  dids_processed  INTEGER DEFAULT 0,
  records_indexed INTEGER DEFAULT 0,
  started_at      TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
  completed_at    TIMESTAMP WITH TIME ZONE,
  error_message   TEXT
);

CREATE TABLE backfill_errors (
  id              SERIAL PRIMARY KEY,
  backfill_id     INTEGER NOT NULL REFERENCES backfill_progress(id),
  did             VARCHAR(255) NOT NULL,
  collection      VARCHAR(255) NOT NULL,
  error_message   TEXT NOT NULL,
  created_at      TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Firehose Integration#

Modified FirehoseService.start():

  1. Check for interrupted backfill (resume if found)
  2. Load cursor, run backfillManager.checkIfNeeded(cursor)
  3. If backfill needed: set isBackfilling = true, await backfill, clear flag
  4. Proceed with existing cursor resume + jetstream start

Guard: reject start() calls while isBackfilling === true.

Admin API#

Method Path Permission Description
POST /api/admin/backfill manageForum Trigger manual backfill; returns { backfillId, status }
GET /api/admin/backfill/:id manageForum Poll progress + error count
GET /api/admin/backfill/:id/errors manageForum List errors for a backfill run

POST behavior: check isBackfilling (409 if busy), determine type via checkIfNeeded(), allow ?force=catch_up|full_sync override, kick off async, return immediately.

Error Handling#

  • PDS unreachable: Log warning, insert backfill_errors row, continue to next DID
  • Record parse failure: Log with AT URI, continue to next record
  • Programming errors: Re-throw (TypeError, ReferenceError, SyntaxError)
  • Partial completion: Status set to completed, errors queryable via admin API

Configuration#

Variable Default Description
BACKFILL_RATE_LIMIT 10 Max XRPC requests/second per PDS
BACKFILL_CONCURRENCY 10 Max DIDs processed concurrently
BACKFILL_CURSOR_MAX_AGE_HOURS 48 Cursor age threshold for CatchUp

Files#

Action File
Create apps/appview/src/lib/backfill-manager.ts
Create apps/appview/src/lib/__tests__/backfill-manager.test.ts
Create apps/appview/src/lib/__tests__/backfill-integration.test.ts
Create packages/db/drizzle/migrations/XXXX_add_backfill_tables.sql
Modify packages/db/src/schema.ts — add backfill tables
Modify apps/appview/src/lib/firehose.ts — backfill check in start()
Modify apps/appview/src/lib/cursor-manager.ts — add getCursorAge()
Modify apps/appview/src/lib/app-context.ts — add backfillManager
Modify apps/appview/src/lib/config.ts — add backfill config fields
Modify apps/appview/src/routes/admin.ts — add backfill endpoints
Modify apps/appview/src/index.ts — wire BackfillManager into startup
Modify turbo.json — add backfill env vars

Testing#

Unit tests: Gap detection (all 4 scenarios), syncRepoRecords pagination, event shape transformation, rate limiting, conflict resolution, resume logic, progress updates.

Integration tests: FullSync with mock PDS, CatchUp with known users, interrupted resume, partial PDS failure, admin endpoint trigger/poll, firehose blocked during backfill.

Mocking: Mock AtpAgent.com.atproto.repo.listRecords for controlled responses. Mock Indexer methods to verify event shapes. Real test DB for progress tracking.