Our Personal Data Server from scratch! tranquil.farm
atproto pds rust postgresql fun oauth

fix: handle AT Protocol $bytes type in json_to_ipld #43

merged opened by sans-self.org targeting main

fix: handle $bytes in json_to_ipld#

What broke#

json_to_ipld knows about $link but not $bytes. So this:

{ "ciphertext": { "$bytes": "ygoGIpnVb/HQTIZythM9..." } }

gets written to CBOR as a map ({ "$bytes": "..." }, major type 5) instead of a raw byte string (major type 2). The data model spec is says that $bytes is a JSON encoding of raw bytes, not a map.

The PDS doesn't notice; it's consistently wrong in both directions, so JSON -> CBOR -> JSON round-trips fine internally. The problem shows up downstream when trying to send, for example, encrypted bytes.

How Jetstream breaks#

Jetstream uses indigo's atdata.UnmarshalCBOR. Its CBOR decoder reads the malformed map into map[string]any, parseMap spots the $bytes key, and routes into parseBytes:

func parseBytes(obj map[string]any) (Bytes, error) {
    if len(obj) != 1 {
        return nil, fmt.Errorf("$bytes objects must have a single field")
    }
    v, ok := obj["$bytes"].(string)
    if !ok {
        return nil, fmt.Errorf("$bytes field missing or not a string")
    }
    b, err := base64.RawStdEncoding.DecodeString(v)
    if err != nil {
        return nil, fmt.Errorf("decoding $byte value: %w", err)
    }
    return Bytes(b), nil
}

RawStdEncoding in Go does not allow padding. If the base64 has = padding, this blows up with "decoding $byte value: illegal base64 data at input byte N". Whether the base64 has padding depends on whatever client created the record; Tranquil wasn't decoding or re-encoding it, just passing the string through as-is inside the CBOR map.

Because of this, every create event for records with $bytes fields gets silently dropped from Jetstream if the base64 it contains requires padding. Deletes still worked because Jetstream doesn't read record bytes for those.

This is why, for example with app.opake.grant certain creates weren't showing up on Jetstream while deletes worked fine.

Why the Node PDS doesn't have this problem#

The official TypeScript PDS converts $bytes -> Uint8Array at the lex layer, before CBOR serialization ever runs. From @atproto/lex-json:

export function parseLexBytes(
  input?: Record<string, unknown>,
): Uint8Array | undefined {
  if (!input || !('$bytes' in input)) return undefined

  for (const key in input) {
    if (key !== '$bytes') return undefined
  }

  if (typeof input.$bytes !== 'string') return undefined

  try {
    return fromBase64(input.$bytes)
  } catch {
    return undefined
  }
}

fromBase64 uses Uint8Array.fromBase64 with lastChunkHandling: 'loose' (native) or dynamically picks padded/unpadded decoding and so both accept =. By the time CBOR serialization runs, the bytes are already a Uint8Array, so the $bytes wrapper never leaks through.

The fix#

$bytes check in json_to_ipld, same pattern as the existing $link check you already did. Single-key object with a string value -> decode from standard base64 -> Ipld::Bytes. Padding accepted but not required, per spec.

Tests#

  • test_json_to_ipld_bytes_simple; base64 -> bytes
  • test_json_to_ipld_bytes_empty; empty bytes
  • test_json_to_ipld_bytes_with_special_base64_chars; + and / in the base64 (the chars that triggered the original downstream failure)
  • test_json_to_ipld_bytes_unpadded; padded and unpadded both decode
  • test_json_to_ipld_bytes_produces_cbor_byte_string_not_map; regression test: asserts CBOR major type 2, not major type 5
  • test_json_to_ipld_bytes_not_confused_with_extra_keys; $bytes with sibling keys stays a map (same as $link behavior)
  • test_json_to_ipld_bytes_nested_in_record; opake-style record with nested $bytes, round-tripped through CBOR

Other issues found#

json_to_ipld also accepts floats (Ipld::Float(f) for non-integer numbers). The AT Protocol data model explicitly bans these. Not this PR's focus, if you like I'll submit another PR for it :)

Labels

None yet.

assignee

None yet.

Participants 2
AT URI
at://did:plc:wydyrngmxbcsqdvhmd7whmye/sh.tangled.repo.pull/3mgu6hpivou22
+138
Diff #1
+138
crates/tranquil-pds/src/util.rs
··· 1 1 use axum::http::{HeaderMap, HeaderName}; 2 + use base64::Engine as _; 3 + 2 4 use cid::Cid; 3 5 use ipld_core::ipld::Ipld; 4 6 use rand::Rng; ··· 8 10 use std::str::FromStr; 9 11 use std::sync::OnceLock; 10 12 13 + const BASE64_STANDARD_INDIFFERENT: base64::engine::GeneralPurpose = 14 + base64::engine::GeneralPurpose::new( 15 + &base64::alphabet::STANDARD, 16 + base64::engine::GeneralPurposeConfig::new() 17 + .with_decode_padding_mode(base64::engine::DecodePaddingMode::Indifferent), 18 + ); 19 + 11 20 const BASE32_ALPHABET: &str = "abcdefghijklmnopqrstuvwxyz234567"; 12 21 13 22 static DISCORD_BOT_USERNAME: OnceLock<String> = OnceLock::new(); ··· 172 181 { 173 182 return Ipld::Link(cid); 174 183 } 184 + if let Some(JsonValue::String(b64)) = obj.get("$bytes") 185 + && obj.len() == 1 186 + { 187 + if let Ok(bytes) = BASE64_STANDARD_INDIFFERENT.decode(b64) { 188 + return Ipld::Bytes(bytes); 189 + } 190 + } 175 191 let map: BTreeMap<String, Ipld> = obj 176 192 .iter() 177 193 .map(|(k, v)| (k.clone(), json_to_ipld(v))) ··· 347 363 panic!("Failed to find CID link in parsed CBOR"); 348 364 } 349 365 366 + #[test] 367 + fn test_json_to_ipld_bytes_simple() { 368 + let json = serde_json::json!({ 369 + "$bytes": "aGVsbG8gd29ybGQ=" 370 + }); 371 + let ipld = json_to_ipld(&json); 372 + match ipld { 373 + Ipld::Bytes(bytes) => { 374 + assert_eq!(bytes, b"hello world"); 375 + } 376 + _ => panic!("Expected Ipld::Bytes, got {:?}", ipld), 377 + } 378 + } 379 + 380 + #[test] 381 + fn test_json_to_ipld_bytes_empty() { 382 + let json = serde_json::json!({ 383 + "$bytes": "" 384 + }); 385 + let ipld = json_to_ipld(&json); 386 + match ipld { 387 + Ipld::Bytes(bytes) => { 388 + assert!(bytes.is_empty()); 389 + } 390 + _ => panic!("Expected Ipld::Bytes, got {:?}", ipld), 391 + } 392 + } 393 + 394 + #[test] 395 + fn test_json_to_ipld_bytes_with_special_base64_chars() { 396 + let json = serde_json::json!({ 397 + "$bytes": "ygoGIpnVb/HQTIZythM9t1iLHkoWY5OeeqlhD0JEEgqHedDSCxG8F1YfipZPMA3JzKG6ssWNzOmZ9iSSW0nDvmjJ5ldwwbgt" 398 + }); 399 + let ipld = json_to_ipld(&json); 400 + match ipld { 401 + Ipld::Bytes(bytes) => { 402 + assert!(!bytes.is_empty()); 403 + } 404 + _ => panic!("Expected Ipld::Bytes, got {:?}", ipld), 405 + } 406 + } 407 + 408 + #[test] 409 + fn test_json_to_ipld_bytes_unpadded() { 410 + let padded = json_to_ipld(&serde_json::json!({ "$bytes": "aGVsbG8=" })); 411 + let unpadded = json_to_ipld(&serde_json::json!({ "$bytes": "aGVsbG8" })); 412 + match (&padded, &unpadded) { 413 + (Ipld::Bytes(a), Ipld::Bytes(b)) => { 414 + assert_eq!(a, b"hello"); 415 + assert_eq!(b, b"hello"); 416 + } 417 + _ => panic!( 418 + "Expected Ipld::Bytes for both, got {:?} / {:?}", 419 + padded, unpadded 420 + ), 421 + } 422 + } 423 + 424 + #[test] 425 + fn test_json_to_ipld_bytes_produces_cbor_byte_string_not_map() { 426 + let json = serde_json::json!({"$bytes": "SGVsbG8="}); 427 + let ipld = json_to_ipld(&json); 428 + let cbor = serde_ipld_dagcbor::to_vec(&ipld).expect("CBOR serialization failed"); 429 + assert_eq!( 430 + cbor[0] & 0xE0, 431 + 0x40, 432 + "expected CBOR byte string (major type 2), got major type {}", 433 + cbor[0] >> 5 434 + ); 435 + } 436 + 437 + #[test] 438 + fn test_json_to_ipld_bytes_not_confused_with_extra_keys() { 439 + let json = serde_json::json!({ 440 + "$bytes": "aGVsbG8=", 441 + "extra": "field" 442 + }); 443 + let ipld = json_to_ipld(&json); 444 + match ipld { 445 + Ipld::Map(_) => {} 446 + _ => panic!( 447 + "Expected Ipld::Map for $bytes with extra keys, got {:?}", 448 + ipld 449 + ), 450 + } 451 + } 452 + 453 + #[test] 454 + fn test_json_to_ipld_bytes_nested_in_record() { 455 + let record = serde_json::json!({ 456 + "$type": "app.opake.grant", 457 + "recipient": "did:plc:example", 458 + "wrappedKey": { 459 + "algo": "x25519-hkdf-a256kw", 460 + "ciphertext": { 461 + "$bytes": "ygoGIpnVb/HQTIZythM9t1iLHkoWY5OeeqlhD0JEEgqHedDSCxG8F1YfipZPMA3JzKG6ssWNzOmZ9iSSW0nDvmjJ5ldwwbgt" 462 + } 463 + }, 464 + "encryptedMetadata": { 465 + "ciphertext": { "$bytes": "aGVsbG8=" }, 466 + "nonce": { "$bytes": "d29ybGQ=" } 467 + } 468 + }); 469 + let ipld = json_to_ipld(&record); 470 + let cbor_bytes = serde_ipld_dagcbor::to_vec(&ipld).expect("CBOR serialization failed"); 471 + let parsed: Ipld = 472 + serde_ipld_dagcbor::from_slice(&cbor_bytes).expect("CBOR deserialization failed"); 473 + if let Ipld::Map(map) = &parsed 474 + && let Some(Ipld::Map(wrapped)) = map.get("wrappedKey") 475 + && let Some(Ipld::Bytes(ct)) = wrapped.get("ciphertext") 476 + && let Some(Ipld::Map(meta)) = map.get("encryptedMetadata") 477 + && let Some(Ipld::Bytes(meta_ct)) = meta.get("ciphertext") 478 + && let Some(Ipld::Bytes(nonce)) = meta.get("nonce") 479 + { 480 + assert!(!ct.is_empty()); 481 + assert_eq!(meta_ct, b"hello"); 482 + assert_eq!(nonce, b"world"); 483 + return; 484 + } 485 + panic!("Failed to find Bytes in parsed CBOR: {:?}", parsed); 486 + } 487 + 350 488 #[test] 351 489 fn test_parse_env_bool_true_values() { 352 490 unsafe { std::env::set_var("TEST_PARSE_ENV_BOOL_1", "true") };

History

2 rounds 2 comments
sign up or login to add to the discussion
1 commit
expand
e64b5421
fix: handle AT Protocol $bytes type in json_to_ipld
expand 0 comments
pull request successfully merged
1 commit
expand
1be84e45
fix: handle AT Protocol $bytes type in json_to_ipld
expand 2 comments

crates/tranquil-pds/src/util.rs:396

In general could you please remove the comments from the tests? I don't think this one in particular is accurate, but either way it's my style preference heh :P

Otherwise this PR LGTM

Oh fair, will remove :)