Our Personal Data Server from scratch! tranquil.farm
atproto pds rust postgresql fun oauth

fix: handle AT Protocol $bytes type in json_to_ipld #43

merged opened by sans-self.org targeting main

fix: handle $bytes in json_to_ipld#

What broke#

json_to_ipld knows about $link but not $bytes. So this:

{ "ciphertext": { "$bytes": "ygoGIpnVb/HQTIZythM9..." } }

gets written to CBOR as a map ({ "$bytes": "..." }, major type 5) instead of a raw byte string (major type 2). The data model spec is says that $bytes is a JSON encoding of raw bytes, not a map.

The PDS doesn't notice; it's consistently wrong in both directions, so JSON -> CBOR -> JSON round-trips fine internally. The problem shows up downstream when trying to send, for example, encrypted bytes.

How Jetstream breaks#

Jetstream uses indigo's atdata.UnmarshalCBOR. Its CBOR decoder reads the malformed map into map[string]any, parseMap spots the $bytes key, and routes into parseBytes:

func parseBytes(obj map[string]any) (Bytes, error) {
    if len(obj) != 1 {
        return nil, fmt.Errorf("$bytes objects must have a single field")
    }
    v, ok := obj["$bytes"].(string)
    if !ok {
        return nil, fmt.Errorf("$bytes field missing or not a string")
    }
    b, err := base64.RawStdEncoding.DecodeString(v)
    if err != nil {
        return nil, fmt.Errorf("decoding $byte value: %w", err)
    }
    return Bytes(b), nil
}

RawStdEncoding in Go does not allow padding. If the base64 has = padding, this blows up with "decoding $byte value: illegal base64 data at input byte N". Whether the base64 has padding depends on whatever client created the record; Tranquil wasn't decoding or re-encoding it, just passing the string through as-is inside the CBOR map.

Because of this, every create event for records with $bytes fields gets silently dropped from Jetstream if the base64 it contains requires padding. Deletes still worked because Jetstream doesn't read record bytes for those.

This is why, for example with app.opake.grant certain creates weren't showing up on Jetstream while deletes worked fine.

Why the Node PDS doesn't have this problem#

The official TypeScript PDS converts $bytes -> Uint8Array at the lex layer, before CBOR serialization ever runs. From @atproto/lex-json:

export function parseLexBytes(
  input?: Record<string, unknown>,
): Uint8Array | undefined {
  if (!input || !('$bytes' in input)) return undefined

  for (const key in input) {
    if (key !== '$bytes') return undefined
  }

  if (typeof input.$bytes !== 'string') return undefined

  try {
    return fromBase64(input.$bytes)
  } catch {
    return undefined
  }
}

fromBase64 uses Uint8Array.fromBase64 with lastChunkHandling: 'loose' (native) or dynamically picks padded/unpadded decoding and so both accept =. By the time CBOR serialization runs, the bytes are already a Uint8Array, so the $bytes wrapper never leaks through.

The fix#

$bytes check in json_to_ipld, same pattern as the existing $link check you already did. Single-key object with a string value -> decode from standard base64 -> Ipld::Bytes. Padding accepted but not required, per spec.

Tests#

  • test_json_to_ipld_bytes_simple; base64 -> bytes
  • test_json_to_ipld_bytes_empty; empty bytes
  • test_json_to_ipld_bytes_with_special_base64_chars; + and / in the base64 (the chars that triggered the original downstream failure)
  • test_json_to_ipld_bytes_unpadded; padded and unpadded both decode
  • test_json_to_ipld_bytes_produces_cbor_byte_string_not_map; regression test: asserts CBOR major type 2, not major type 5
  • test_json_to_ipld_bytes_not_confused_with_extra_keys; $bytes with sibling keys stays a map (same as $link behavior)
  • test_json_to_ipld_bytes_nested_in_record; opake-style record with nested $bytes, round-tripped through CBOR

Other issues found#

json_to_ipld also accepts floats (Ipld::Float(f) for non-integer numbers). The AT Protocol data model explicitly bans these. Not this PR's focus, if you like I'll submit another PR for it :)

Labels

None yet.

assignee

None yet.

Participants 2
AT URI
at://did:plc:wydyrngmxbcsqdvhmd7whmye/sh.tangled.repo.pull/3mgu6hpivou22
-10
Interdiff #0 โ†’ #1
-10
crates/tranquil-pds/src/util.rs
··· 393 393 394 394 #[test] 395 395 fn test_json_to_ipld_bytes_with_special_base64_chars() { 396 - // standard base64 with +, / and padding 397 396 let json = serde_json::json!({ 398 397 "$bytes": "ygoGIpnVb/HQTIZythM9t1iLHkoWY5OeeqlhD0JEEgqHedDSCxG8F1YfipZPMA3JzKG6ssWNzOmZ9iSSW0nDvmjJ5ldwwbgt" 399 398 }); ··· 408 407 409 408 #[test] 410 409 fn test_json_to_ipld_bytes_unpadded() { 411 - // "hello" in base64 is "aGVsbG8=" padded, "aGVsbG8" unpadded โ€” both must work 412 410 let padded = json_to_ipld(&serde_json::json!({ "$bytes": "aGVsbG8=" })); 413 411 let unpadded = json_to_ipld(&serde_json::json!({ "$bytes": "aGVsbG8" })); 414 412 match (&padded, &unpadded) { ··· 425 423 426 424 #[test] 427 425 fn test_json_to_ipld_bytes_produces_cbor_byte_string_not_map() { 428 - // Regression: without $bytes handling, json_to_ipld encodes {"$bytes": "..."} 429 - // as a CBOR map (major type 5), but the AT Proto spec requires CBOR byte string 430 - // (major type 2). This is what caused Jetstream's atdata.UnmarshalCBOR to fail 431 - // with "decoding $byte value: illegal base64 data at input byte 51". 432 426 let json = serde_json::json!({"$bytes": "SGVsbG8="}); 433 427 let ipld = json_to_ipld(&json); 434 428 let cbor = serde_ipld_dagcbor::to_vec(&ipld).expect("CBOR serialization failed"); 435 - // CBOR major type 2 (byte string): first byte high nibble = 0x40 436 - // CBOR major type 5 (map): first byte high nibble = 0xA0 437 429 assert_eq!( 438 430 cbor[0] & 0xE0, 439 431 0x40, ··· 444 436 445 437 #[test] 446 438 fn test_json_to_ipld_bytes_not_confused_with_extra_keys() { 447 - // $bytes with extra keys should be treated as a regular map 448 439 let json = serde_json::json!({ 449 440 "$bytes": "aGVsbG8=", 450 441 "extra": "field" ··· 476 467 } 477 468 }); 478 469 let ipld = json_to_ipld(&record); 479 - // round-trip through CBOR 480 470 let cbor_bytes = serde_ipld_dagcbor::to_vec(&ipld).expect("CBOR serialization failed"); 481 471 let parsed: Ipld = 482 472 serde_ipld_dagcbor::from_slice(&cbor_bytes).expect("CBOR deserialization failed");

History

2 rounds 2 comments
sign up or login to add to the discussion
1 commit
expand
e64b5421
fix: handle AT Protocol $bytes type in json_to_ipld
expand 0 comments
pull request successfully merged
1 commit
expand
1be84e45
fix: handle AT Protocol $bytes type in json_to_ipld
expand 2 comments

crates/tranquil-pds/src/util.rs:396

In general could you please remove the comments from the tests? I don't think this one in particular is accurate, but either way it's my style preference heh :P

Otherwise this PR LGTM

Oh fair, will remove :)