P2PDS#
🚨 WARNING: very experimental
Peer-to-peer backup and archiving for AT Protocol accounts.
Your atproto data — posts, follows, likes, profile, media — lives on a PDS run by someone else. If that PDS shuts down, gets acquired, or just loses your data, it's gone. P2PDS gives you a complete, continuously-synced copy of your account that you own and control. If the worst happens, you have everything you need to migrate to a new PDS and resume where you left off.
For individuals: Log in with your atproto account. P2PDS syncs your entire repo and all your blobs to local storage. That's it — you have a backup. Want to back up a friend's data too? Add their handle, they accept on their end, and both of you are now archiving each other's accounts. No technical setup, no command line, no servers to manage.
For communities and organizations: Groups can collectively archive each other's data. A neighborhood association, an activist collective, a research lab — any group of people who want to ensure their records survive regardless of what happens to any single PDS or hosting provider. Everyone archives everyone, creating resilient redundancy across the group.
For public accountability: Not all archiving requires consent. Public figures, government accounts, and institutional records can be archived unilaterally — the same way the Wayback Machine archives the public web. Researchers, journalists, and watchdog organizations can maintain independent copies of public atproto data without needing permission from the account holder.
Consent-driven by default: When two users want to archive each other, the flow is consent-based. You publish an offer, the other person sees it and accepts (or doesn't). Mutual acceptance triggers automatic replication. Either party can revoke at any time by deleting their offer — the agreement dissolves and the data is purged. Users can also publish a blanket consent record saying "anyone can archive me," enabling one-way archiving by peers who check for it.
P2PDS is infrastructure — like a torrent client for atproto data. It has no identity of its own. Users authenticate with their own atproto accounts, and all coordination happens through standard atproto records published to the user's own repo.
How it works#
A user logs in with their atproto account. P2PDS syncs their repo and blobs into a local SQLite-backed IPFS blockstore. To replicate another user's data, the user publishes an offer record to their own repo. If the other user's node reciprocates, both nodes detect the mutual agreement and begin syncing automatically. All coordination happens through atproto records — no custom signaling protocol.
Architecture#
P2PDS is built as four loosely coupled layers. Each layer has a single responsibility and communicates with adjacent layers through narrow interfaces, so any layer can be replaced without affecting the others.
┌──────────────────────────────────────────────────────┐
│ Policy Engine │
│ │
│ Offer negotiation, agreement detection, lifecycle, │
│ consent tracking, merge rules, sync scheduling │
│ │
│ Speaks: atproto (lexicon records on user's PDS) │
│ Knows nothing about: blocks, CIDs, libp2p, SQLite │
├──────────────────────────────────────────────────────┤
│ Replication Engine │
│ │
│ Sync repos, fetch blobs, verify block integrity, │
│ challenge-response proofs, firehose subscription │
│ │
│ Asks policy: "which DIDs, how often, what priority?"│
│ Asks storage: "put/get/has block" │
│ Asks network: "announce CIDs, notify peers" │
│ Knows nothing about: offers, consent, libp2p, SQL │
├───────────────────────┬──────────────────────────────┤
│ Storage │ Network │
│ │ │
│ BlockStore interface │ NetworkService interface │
│ put/get/has block │ provide, announce, pubsub │
│ SyncStorage (state) │ │
│ │ Currently: Helia/libp2p │
│ Currently: SQLite │ Could be: Iroh, HTTP-only, │
│ Could be: LevelDB, │ Hyperswarm, or anything │
│ filesystem, S3 │ that moves bytes between │
│ │ peers │
└───────────────────────┴──────────────────────────────┘
Why this separation:
-
Policy is protocol-native. All negotiation happens through atproto records published to the user's own PDS — standard lexicons, standard APIs. If you swapped IPFS for a different content-addressed network, policy wouldn't change at all. If you swapped atproto for a different identity system, only policy needs updating.
-
Replication is protocol-agnostic. It receives a list of DIDs and sync parameters from policy, fetches bytes, stores them, and verifies integrity. It doesn't know how agreements were formed or how blocks are physically stored. It talks to storage and network through interfaces (
BlockStore,NetworkService), never concrete implementations. -
Storage is an interface.
BlockStorehas four methods:putBlock,getBlock,hasBlock,deleteBlock. The current implementation is SQLite-backed, but any key-value store that maps CID strings to byte arrays would work. Replication and policy never import SQLite directly. -
Network is an interface.
NetworkServicehandles CID announcement, pubsub notifications, and peer connectivity. The current implementation wraps Helia/libp2p, but the interface is transport-agnostic — HTTP-only, Iroh, or Hyperswarm could implement it without touching replication or policy code.
Other design principles:
- No node identity — p2pds acts on behalf of users, not as its own entity
- Lazy identity — starts without a DID; identity established on first OAuth login
- Any PDS — works with Bluesky, self-hosted, or any atproto-compatible PDS
- Any deployment — local desktop (Tauri), cloud, co-located server
Lexicons#
P2PDS defines three record types published to the user's own repo:
| NSID | rkey | Purpose | Used in |
|---|---|---|---|
org.p2pds.peer |
self |
Binds DID → libp2p PeerID + multiaddrs + p2pds endpoint URL | Peer discovery: nodes read this to find each other's transport addresses and HTTP endpoints |
org.p2pds.replication.offer |
DID (colons→hyphens) | Declares willingness to replicate a specific DID | Offer negotiation: mutual offers trigger automatic replication agreements |
org.p2pds.replication.consent |
self |
Opt-in: "I consent to being archived" | Consensual archive: peers check this before archiving without reciprocal offer |
There is no "accepted" or "rejected" record type. Agreement is implicit: if Alice has an offer targeting Bob and Bob has an offer targeting Alice, both nodes independently detect the mutual offers and begin replicating. Revoking an offer (deleting the record) dissolves the agreement. This keeps the protocol surface minimal — one record type handles proposing, accepting, and revoking — and the state is verifiable by either party at any time by reading the other's repo.
Schemas are in lexicons/ and validated by src/lexicons.ts.
Replication flow#
- User adds a DID via the web UI
- Node publishes
org.p2pds.replication.offerto the user's repo - Node resolves the target's
org.p2pds.peerrecord → finds their p2pds endpoint - Node POSTs a push notification to the target's
notifyOfferendpoint - Target verifies the offer exists in the offerer's repo (anti-spoofing)
- Target user sees the incoming offer in their UI → Accept or Reject
- Accepting creates a reciprocal offer + push notification back
- Both nodes detect mutual agreement → auto-generate replication policy
- Sync loop: fetch repo via libp2p (peer-first) or HTTP (PDS fallback), store blocks/blobs, verify, announce
- Real-time updates via firehose subscription between periodic syncs
Three replication modes:
- Reciprocal archive — Mutual consent, bidirectional replication
- Consensual archive — One-way replication with explicit opt-in from the target
- Non-consensual archive — One-way replication without target's explicit permission
Verification#
Content-addressed retrieval is unforgeable: correct bytes for a CID = proof of storage. The verification stack:
| Layer | Method |
|---|---|
| L0 | Commit root — compare local root CID with source PDS via getHead |
| L1 | Local block sampling — verify random blocks exist in local blockstore |
| L2 | Block-sample challenge — challenge peers to produce specific blocks |
| L3 | MST proof challenge — challenge peers to produce Merkle path proofs |
Challenge-response protocol: StorageChallenge → StorageChallengeResponse → StorageChallengeResult. Deterministic generation from epoch + DIDs + nonce. Transport-agnostic with libp2p primary and HTTP fallback.
Lexicon index#
Every record path stored during sync includes a lexicon NSID (the collection portion of collection/rkey). P2PDS aggregates these into a local lexicon index — a catalog of every lexicon encountered across all replicated repos.
- Automatic population — updated incrementally after every full sync and firehose event, rebuilt from scratch on startup
- Public API — three unauthenticated endpoints for querying:
GET /xrpc/org.p2pds.lexicon.search?q=app.bsky&limit=50— prefix searchGET /xrpc/org.p2pds.lexicon.list?limit=100— all NSIDs by record countGET /xrpc/org.p2pds.lexicon.stats— aggregate stats (unique NSIDs, total records)
- UI — searchable table in the web interface showing NSID, record count, repo count, first/last seen dates
This is the foundation for distributed lexicon discovery — nodes can query each other's indexes to find who stores what.
Storage#
All persistent state in a single SQLite database (pds.db):
- IPFS blocks/datastore — SQLite-backed, no filesystem churn
- Replication state — sync progress, peer info, block/blob tracking, firehose cursor
- Challenge history — proof-of-storage results and peer reliability scores
- Lexicon index — aggregated NSID usage across all replicated repos
- PLC mirror — archived PLC operation logs for tracked DIDs
- Node identity — DID + handle, established on first OAuth login
PLC log archiving#
P2PDS mirrors PLC operation logs for all tracked did:plc DIDs. The PLC directory is the root of trust for DID resolution — if it goes down or loses data, DID documents become unresolvable. By archiving PLC logs locally, each node maintains an independent backup of the identity layer.
- Automatic — logs are fetched on first sync and refreshed every 6 hours
- Cross-node sharing — public endpoint
GET /xrpc/org.p2pds.plc.getLog?did=...lets nodes fetch PLC logs from each other, not just from the central PLC directory - Per-DID status — the UI shows PLC archive status (op count, last fetch, tombstone state) for each tracked DID
- Validation — operation chains are validated on fetch
Policy engine#
Every replication relationship is backed by a policy object with lifecycle state, consent tracking, and merge rules. Policies are created automatically from offer negotiation or manually via config/UI.
Policy types:
- Reciprocal — auto-generated when mutual offers are detected between peers
- Archive — user-added DIDs via the web UI
- Config — DIDs from the
REPLICATE_DIDSenvironment variable
Lifecycle: proposed → active → suspended → terminated → purged. Consent status tracked per-policy (reciprocal, consented, unconsented, revoked).
Merge rules when multiple policies match a DID: max(minCopies), min(intervalSec), max(retention), union(preferredPeers). All policies are SQLite-persisted and survive restarts.
Recovery#
P2PDS keeps a complete, continuously-synced copy of your account data — every repo block, every blob. If your PDS disappears, you can export this data and import it to a new PDS. But having a copy of your data is not enough to recover your account. There is a critical prerequisite that is outside p2pds's control.
What p2pds gives you:
- A CAR file containing your complete repo (all commits, MST nodes, records)
- All your blobs (images, media)
- Export endpoints:
exportRepo(CAR download),exportBlobs(blob listing),getBlob(individual blob download)
What you also need: a rotation key.
AT Protocol identities (did:plc) are controlled by rotation keys — cryptographic keys that can update where your DID points. To move your account to a new PDS, you need to sign a PLC operation that says "my DID now lives at this new PDS." Only a rotation key can do that.
Your PDS holds rotation keys for your DID. If the PDS is gone, those keys are gone too — unless you independently hold one. Without a rotation key, you have all your data but no way to prove you own the identity. You cannot update your DID document, cannot point it at a new PDS, and cannot resume as the same account.
Recovery steps (if you hold a rotation key):
- Export your repo as a CAR file and download your blobs from p2pds
- Create an account on a new PDS using your existing DID
- Import the CAR file via
com.atproto.repo.importRepo - Re-upload blobs
- Sign a PLC operation (with your rotation key) pointing your DID at the new PDS
- Activate the account
Handle survival: If you used a custom domain handle (verified via DNS), it survives migration — the DNS record still points to your DID. If you used a .bsky.social handle, it's gone — that subdomain is controlled by Bluesky.
Rotation key management in p2pds: P2PDS includes a UI for adding rotation keys to your PLC document. The flow: request a PLC operation token (triggers an email from your PDS), enter the token, and submit your public key. This uses the standard com.atproto.identity.signPlcOperation API. You can also view your current rotation keys from the web interface.
The honest reality for most Bluesky users today: Bluesky has not yet shipped user-facing rotation key management. Most users don't independently hold a rotation key. If Bluesky's PDS infrastructure disappeared tomorrow, those users would have their data (thanks to p2pds) but could not recover their identity. This is an upstream gap in the atproto ecosystem, not something p2pds can solve — but it's important to understand. The data preservation is still valuable: your posts, social graph, and media survive, even if reattaching them to the same DID requires key management tooling that doesn't exist yet.
Stack#
- Runtime: Node.js, TypeScript (ES2022, strict)
- HTTP: Hono
- Database: better-sqlite3
- IPFS: Helia with minimal libp2p (TCP + noise + yamux + autoNAT + kadDHT client mode)
- UI: Lit web components, esbuild-bundled
- Identity: AT Protocol DIDs via PLC directory
- Auth: OAuth (primary) or legacy JWT (fallback)
- Content addressing: DASL CIDs (CIDv1, SHA-256, dag-cbor/raw, base32lower)
- Desktop: Tauri v2 (optional,
apps/desktop/)
Development#
npm install
npm test
npm run dev
Two-node testing#
npm run start:both # Build and start both nodes
npm run start:both -- --clean # Wipe data first
npm run stop # Stop both nodes
npm run logs # Tail logs for both nodes
npm run health # Check node1 health
npm run check-api # Full API check with auth
Project structure#
src/
index.ts Hono app with all routes
server.ts HTTP server entry point
start.ts Server startup orchestrator
config.ts Config interface + loadConfig()
ipfs.ts IpfsService (Helia wrapper, SQLite-backed)
build-ui.ts esbuild bundler for Lit UI
ui/ Lit web components (app shell, cards, state)
replication/ Sync, verification, challenges, offers, lexicon index
policy/ Policy engine types, engine, presets
identity/ PLC mirror, rotation key management
oauth/ OAuth client, routes, PdsClient
xrpc/ XRPC endpoint handlers
middleware/ Auth, rate limiting, body limits
scripts/ Two-node testing scripts
lexicons/ Lexicon JSON schemas
apps/desktop/ Tauri desktop app
Configuration#
Environment variables (or .env file):
| Variable | Default | Description |
|---|---|---|
PORT |
3000 |
HTTP port |
DATA_DIR |
./data |
Data directory |
OAUTH_ENABLED |
true |
Enable OAuth login |
PUBLIC_URL |
http://localhost:$PORT |
Public URL for push notifications |
IPFS_ENABLED |
true |
Enable IPFS |
IPFS_NETWORKING |
true |
Enable libp2p networking |
REPLICATE_DIDS |
Comma-separated DIDs to replicate on startup | |
FIREHOSE_URL |
wss://bsky.network/... |
Firehose WebSocket URL |
FIREHOSE_ENABLED |
true |
Enable firehose sync |
POLICY_FILE |
Path to policy JSON file | |
RATE_LIMIT_ENABLED |
true |
Enable rate limiting |