fix: handle $bytes in json_to_ipld#
What broke#
json_to_ipld knows about $link but not $bytes. So this:
{ "ciphertext": { "$bytes": "ygoGIpnVb/HQTIZythM9..." } }
gets written to CBOR as a map ({ "$bytes": "..." }, major type 5) instead of
a raw byte string (major type 2). The
data model spec is says that
$bytes is a JSON encoding of raw bytes, not a map.
The PDS doesn't notice; it's consistently wrong in both directions, so JSON -> CBOR -> JSON round-trips fine internally. The problem shows up downstream when trying to send, for example, encrypted bytes.
How Jetstream breaks#
Jetstream uses indigo's atdata.UnmarshalCBOR. Its CBOR decoder reads the
malformed map into map[string]any, parseMap spots the $bytes key, and
routes into
parseBytes:
func parseBytes(obj map[string]any) (Bytes, error) {
if len(obj) != 1 {
return nil, fmt.Errorf("$bytes objects must have a single field")
}
v, ok := obj["$bytes"].(string)
if !ok {
return nil, fmt.Errorf("$bytes field missing or not a string")
}
b, err := base64.RawStdEncoding.DecodeString(v)
if err != nil {
return nil, fmt.Errorf("decoding $byte value: %w", err)
}
return Bytes(b), nil
}
RawStdEncoding in Go does not allow padding. If the base64 has = padding,
this blows up with "decoding $byte value: illegal base64 data at input byte N".
Whether the base64 has padding depends on whatever client created the record;
Tranquil wasn't decoding or re-encoding it, just passing the string through
as-is inside the CBOR map.
Because of this, every create event for records with $bytes fields gets silently
dropped from Jetstream if the base64 it contains requires padding. Deletes still worked
because Jetstream doesn't read record bytes for those.
This is why, for example with app.opake.grant
certain creates weren't showing up on Jetstream while deletes worked fine.
Why the Node PDS doesn't have this problem#
The official TypeScript PDS converts $bytes -> Uint8Array at the lex layer,
before CBOR serialization ever runs. From
@atproto/lex-json:
export function parseLexBytes(
input?: Record<string, unknown>,
): Uint8Array | undefined {
if (!input || !('$bytes' in input)) return undefined
for (const key in input) {
if (key !== '$bytes') return undefined
}
if (typeof input.$bytes !== 'string') return undefined
try {
return fromBase64(input.$bytes)
} catch {
return undefined
}
}
fromBase64
uses Uint8Array.fromBase64 with lastChunkHandling: 'loose' (native) or
dynamically picks padded/unpadded decoding and so both accept =.
By the time CBOR serialization runs, the bytes are already a Uint8Array, so
the $bytes wrapper never leaks through.
The fix#
$bytes check in json_to_ipld, same pattern as the existing $link check you already
did. Single-key object with a string value -> decode from standard base64 -> Ipld::Bytes. Padding accepted but not required, per spec.
Tests#
test_json_to_ipld_bytes_simple; base64 -> bytestest_json_to_ipld_bytes_empty; empty bytestest_json_to_ipld_bytes_with_special_base64_chars;+and/in the base64 (the chars that triggered the original downstream failure)test_json_to_ipld_bytes_unpadded; padded and unpadded both decodetest_json_to_ipld_bytes_produces_cbor_byte_string_not_map; regression test: asserts CBOR major type 2, not major type 5test_json_to_ipld_bytes_not_confused_with_extra_keys;$byteswith sibling keys stays a map (same as$linkbehavior)test_json_to_ipld_bytes_nested_in_record; opake-style record with nested$bytes, round-tripped through CBOR
Other issues found#
json_to_ipld also accepts floats (Ipld::Float(f) for non-integer numbers).
The AT Protocol data model
explicitly bans these. Not this PR's
focus, if you like I'll submit another PR for it :)
crates/tranquil-pds/src/util.rs:396In general could you please remove the comments from the tests? I don't think this one in particular is accurate, but either way it's my style preference heh :P
Otherwise this PR LGTM