+164
docs/profile-blob-hydration.md
+164
docs/profile-blob-hydration.md
···
1
+
# Profile Blob Hydration - Implementation Notes
2
+
3
+
## Overview
4
+
5
+
This document captures key learnings from implementing avatar and banner blob hydration for Bluesky profiles.
6
+
7
+
## Key Discoveries
8
+
9
+
### 1. CID Deserialization in @atproto/api
10
+
11
+
The `@atproto/api` library deserializes blob references from their JSON `$link` representation into CID class objects.
12
+
13
+
**Raw JSON from API:**
14
+
```json
15
+
{
16
+
"avatar": {
17
+
"$type": "blob",
18
+
"ref": {
19
+
"$link": "bafkreigg3s6plegjncmxubeufbohj3qasbm4r23q2x7zlivdhccfqfypve"
20
+
},
21
+
"mimeType": "image/jpeg",
22
+
"size": 101770
23
+
}
24
+
}
25
+
```
26
+
27
+
**What you get in TypeScript:**
28
+
```typescript
29
+
record.avatar.ref // CID object with { code, version, hash, ... }
30
+
```
31
+
32
+
**Solution:**
33
+
```typescript
34
+
const cid = record.avatar.ref.toString(); // "bafkrei..."
35
+
```
36
+
37
+
### 2. PDS Endpoint Resolution
38
+
39
+
Users can be on different Personal Data Servers (PDS), not just `bsky.social`. Blobs must be fetched from the user's actual PDS.
40
+
41
+
**Process:**
42
+
1. Query PLC directory for DID document: `https://plc.wtf/${did}`
43
+
2. Find service with `id: "#atproto_pds"` and `type: "AtprotoPersonalDataServer"`
44
+
3. Extract `serviceEndpoint` URL
45
+
4. Use that endpoint for `com.atproto.sync.getBlob`
46
+
47
+
**Example:**
48
+
```typescript
49
+
const didDoc = await fetch(`https://plc.wtf/${did}`).then(r => r.json());
50
+
const pdsService = didDoc.service?.find(s =>
51
+
s.id === "#atproto_pds" && s.type === "AtprotoPersonalDataServer"
52
+
);
53
+
const pdsEndpoint = pdsService.serviceEndpoint; // e.g., "https://waxcap.us-west.host.bsky.network"
54
+
```
55
+
56
+
### 3. Correct Blob Fetching
57
+
58
+
**Don't use CDN paths** - they don't work reliably for all blobs and require authentication context.
59
+
60
+
**Use the AT Protocol API:**
61
+
```typescript
62
+
const blobUrl = `${pdsEndpoint}/xrpc/com.atproto.sync.getBlob?did=${did}&cid=${cid}`;
63
+
const response = await fetch(blobUrl);
64
+
const blobData = Buffer.from(await response.arrayBuffer());
65
+
```
66
+
67
+
### 4. Database Schema Design
68
+
69
+
**Separate tables for different blob types:**
70
+
71
+
- `blobs` table: Post images with FK to `posts(uri)`
72
+
- `profile_blobs` table: Avatars/banners with FK to `profiles(did)`
73
+
74
+
This allows proper relational queries and analysis.
75
+
76
+
**Profile blobs schema:**
77
+
```sql
78
+
CREATE TABLE profile_blobs (
79
+
did TEXT NOT NULL,
80
+
blob_type TEXT NOT NULL CHECK (blob_type IN ('avatar', 'banner')),
81
+
blob_cid TEXT NOT NULL,
82
+
sha256 TEXT NOT NULL,
83
+
phash TEXT,
84
+
storage_path TEXT,
85
+
mimetype TEXT,
86
+
captured_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
87
+
PRIMARY KEY (did, blob_type, captured_at),
88
+
FOREIGN KEY (did) REFERENCES profiles(did)
89
+
);
90
+
```
91
+
92
+
### 5. Change Tracking
93
+
94
+
Including `captured_at` in the primary key allows tracking when users change their avatars/banners.
95
+
96
+
**Query latest state:**
97
+
```sql
98
+
SELECT * FROM profile_blobs
99
+
WHERE did = ? AND blob_type = ?
100
+
ORDER BY captured_at DESC
101
+
LIMIT 1
102
+
```
103
+
104
+
**Only insert if changed:**
105
+
```typescript
106
+
const latest = await findLatestByDidAndType(did, type);
107
+
if (latest && latest.blob_cid === cid) {
108
+
return; // No change, skip
109
+
}
110
+
// Insert new row with current timestamp
111
+
```
112
+
113
+
### 6. Sentinel Values for Missing Data
114
+
115
+
Use empty string (`""`) to distinguish "we checked, user has no avatar" from NULL "we haven't checked yet".
116
+
117
+
```typescript
118
+
if (record.avatar?.ref) {
119
+
avatarCid = record.avatar.ref.toString();
120
+
} else {
121
+
avatarCid = ""; // Explicitly checked, not present
122
+
}
123
+
```
124
+
125
+
This prevents infinite re-hydration loops for profiles without avatars.
126
+
127
+
### 7. Profile Re-hydration Logic
128
+
129
+
```typescript
130
+
const existingProfile = await findByDid(did);
131
+
const needsRehydration = existingProfile &&
132
+
(existingProfile.avatar_cid === null || existingProfile.banner_cid === null);
133
+
134
+
if (existingProfile && !needsRehydration) {
135
+
return; // Skip
136
+
}
137
+
```
138
+
139
+
## Configuration
140
+
141
+
- `PLC_ENDPOINT`: DID resolution endpoint (default: `https://plc.wtf`)
142
+
- Can be changed to `https://plc.directory` or custom instance
143
+
- plc.wtf is faster but unofficial
144
+
145
+
## Common Errors
146
+
147
+
### "RepoNotFound"
148
+
- **Cause:** Querying wrong PDS endpoint
149
+
- **Solution:** Resolve correct PDS from DID document
150
+
151
+
### Foreign Key Constraint Violation
152
+
- **Cause:** Trying to insert profile blobs into `blobs` table
153
+
- **Solution:** Use separate `profile_blobs` table
154
+
155
+
### Missing CIDs Despite API Returning Them
156
+
- **Cause:** Trying to access `ref.$link` when ref is a CID object
157
+
- **Solution:** Call `.toString()` on the CID object
158
+
159
+
## Related Files
160
+
161
+
- `src/hydration/profiles.service.ts` - Main hydration logic
162
+
- `src/database/profile-blobs.repository.ts` - Profile blob persistence
163
+
- `src/database/schema.ts` - Table definitions
164
+
- `src/config/index.ts` - PLC endpoint configuration