RFC6901 JSON Pointer implementation in OCaml using jsont

docs

Changed files
+262 -151
doc
+262 -151
doc/tutorial.md
··· 28 28 29 29 The JSON Pointer `/users/0/name` refers to the string `"Alice"`. 30 30 31 + In OCaml, this is represented by the `Jsont_pointer.t` type - a sequence 32 + of navigation steps from the document root to a target value. 33 + 31 34 ## Syntax: Reference Tokens 32 35 33 36 RFC 6901, Section 3 defines the syntax: ··· 79 82 80 83 Multiple tokens navigate deeper into nested structures. 81 84 82 - ### Invalid Syntax 83 - 84 - What happens if a pointer doesn't start with `/`? 85 - 86 - ```sh 87 - $ jsonpp parse "foo" 88 - ERROR: Invalid JSON Pointer: must be empty or start with '/': foo 89 - ``` 90 - 91 - The RFC is strict: non-empty pointers MUST start with `/`. 92 - 93 - ## Escaping Special Characters 94 - 95 - RFC 6901, Section 3 explains the escaping rules: 96 - 97 - > Because the characters '~' (%x7E) and '/' (%x2F) have special meanings 98 - > in JSON Pointer, '~' needs to be encoded as '~0' and '/' needs to be 99 - > encoded as '~1' when these characters appear in a reference token. 100 - 101 - Why these specific characters? 102 - - `/` separates tokens, so it must be escaped inside a token 103 - - `~` is the escape character itself, so it must also be escaped 104 - 105 - The escape sequences are: 106 - - `~0` represents `~` (tilde) 107 - - `~1` represents `/` (forward slash) 108 - 109 - Let's see escaping in action: 110 - 111 - ```sh 112 - $ jsonpp escape "hello" 113 - hello 114 - ``` 115 - 116 - No special characters, no escaping needed. 117 - 118 - ```sh 119 - $ jsonpp escape "a/b" 120 - a~1b 121 - ``` 122 - 123 - The `/` becomes `~1`. 124 - 125 - ```sh 126 - $ jsonpp escape "a~b" 127 - a~0b 128 - ``` 129 - 130 - The `~` becomes `~0`. 85 + ### The Index Type 131 86 132 - ```sh 133 - $ jsonpp escape "~/" 134 - ~0~1 135 - ``` 136 - 137 - Both characters are escaped. 138 - 139 - ### Unescaping 140 - 141 - And the reverse process: 142 - 143 - ```sh 144 - $ jsonpp unescape "a~1b" 145 - OK: a/b 146 - ``` 87 + Each reference token becomes an `Index.t` value in the library: 147 88 148 - ```sh 149 - $ jsonpp unescape "a~0b" 150 - OK: a~b 89 + ```ocaml 90 + type t = 91 + | Mem of string (* Object member access *) 92 + | Nth of int (* Array index access *) 93 + | End (* The special "-" marker for append operations *) 151 94 ``` 152 95 153 - ### The Order Matters! 96 + The `Mem` variant holds the **unescaped** member name - you work with the 97 + actual key string (like `"a/b"`) and the library handles any escaping needed 98 + for the JSON Pointer string representation. 154 99 155 - RFC 6901, Section 4 is careful to specify the unescaping order: 156 - 157 - > Evaluation of each reference token begins by decoding any escaped 158 - > character sequence. This is performed by first transforming any 159 - > occurrence of the sequence '~1' to '/', and then transforming any 160 - > occurrence of the sequence '~0' to '~'. By performing the substitutions 161 - > in this order, an implementation avoids the error of turning '~01' first 162 - > into '~1' and then into '/', which would be incorrect (the string '~01' 163 - > correctly becomes '~1' after transformation). 100 + ### Invalid Syntax 164 101 165 - Let's verify this tricky case: 102 + What happens if a pointer doesn't start with `/`? 166 103 167 104 ```sh 168 - $ jsonpp unescape "~01" 169 - OK: ~1 105 + $ jsonpp parse "foo" 106 + ERROR: Invalid JSON Pointer: must be empty or start with '/': foo 170 107 ``` 171 108 172 - If we unescaped `~0` first, `~01` would become `~1`, which would then become 173 - `/`. But that's wrong! The sequence `~01` should become the literal string 174 - `~1` (a tilde followed by the digit one). 175 - 176 - Invalid escape sequences are rejected: 177 - 178 - ```sh 179 - $ jsonpp unescape "~2" 180 - ERROR: Invalid JSON Pointer: invalid escape sequence ~2 181 - ``` 182 - 183 - ```sh 184 - $ jsonpp unescape "hello~" 185 - ERROR: Invalid JSON Pointer: incomplete escape sequence at end 186 - ``` 109 + The RFC is strict: non-empty pointers MUST start with `/`. 187 110 188 111 ## Evaluation: Navigating JSON 189 112 ··· 195 118 > the document. Each reference token in the JSON Pointer is evaluated 196 119 > sequentially. 197 120 121 + In the library, this is the `Jsont_pointer.get` function: 122 + 123 + ```ocaml 124 + val get : t -> Jsont.json -> Jsont.json 125 + ``` 126 + 198 127 Let's use the example JSON document from RFC 6901, Section 5: 199 128 200 129 ```sh ··· 222 151 OK: {"foo":["bar","baz"],"":0,"a/b":1,"c%d":2,"e^f":3,"g|h":4,"i\\j":5,"k\"l":6," ":7,"m~n":8} 223 152 ``` 224 153 225 - The empty pointer returns the whole document. 154 + The empty pointer returns the whole document. In OCaml, this is 155 + `Jsont_pointer.root`: 156 + 157 + ```ocaml 158 + val root : t 159 + (** The empty pointer that references the whole document. *) 160 + ``` 226 161 227 162 ### Object Member Access 228 163 ··· 263 198 264 199 ### Keys with Special Characters 265 200 266 - Now for the escape sequences: 201 + The RFC example includes keys with `/` and `~` characters: 267 202 268 203 ```sh 269 204 $ jsonpp eval rfc6901_example.json "/a~1b" 270 205 OK: 1 271 206 ``` 272 207 273 - The token `a~1b` unescapes to `a/b`, which is the key name. 208 + The token `a~1b` refers to the key `a/b`. We'll explain this escaping 209 + [below](#escaping-special-characters). 274 210 275 211 ```sh 276 212 $ jsonpp eval rfc6901_example.json "/m~0n" 277 213 OK: 8 278 214 ``` 279 215 280 - The token `m~0n` unescapes to `m~n`. 216 + The token `m~0n` refers to the key `m~n`. 217 + 218 + **Important**: When using the OCaml library programmatically, you don't need 219 + to worry about escaping. The `Index.Mem` variant holds the literal key name: 220 + 221 + ```ocaml 222 + (* To access the key "a/b", just use the literal string *) 223 + let pointer = Jsont_pointer.make [Mem "a/b"] 224 + 225 + (* The library escapes it when converting to string *) 226 + let s = Jsont_pointer.to_string pointer (* "/a~1b" *) 227 + ``` 281 228 282 229 ### Other Special Characters (No Escaping Needed) 283 230 ··· 329 276 $ jsonpp eval rfc6901_example.json "/foo/0/invalid" 330 277 ERROR: JSON Pointer: cannot index into string with 'invalid' 331 278 File "-": 279 + ``` 280 + 281 + The library provides both exception-raising and result-returning variants: 282 + 283 + ```ocaml 284 + val get : t -> Jsont.json -> Jsont.json 285 + val get_result : t -> Jsont.json -> (Jsont.json, Jsont.Error.t) result 286 + val find : t -> Jsont.json -> Jsont.json option 332 287 ``` 333 288 334 289 ### Array Index Rules ··· 393 348 394 349 But we'll see later that `-` is very useful for mutation operations! 395 350 396 - ## URI Fragment Encoding 397 - 398 - JSON Pointers can be embedded in URIs. RFC 6901, Section 6 explains: 399 - 400 - > A JSON Pointer can be represented in a URI fragment identifier by 401 - > encoding it into octets using UTF-8, while percent-encoding those 402 - > characters not allowed by the fragment rule in RFC 3986. 403 - 404 - This adds percent-encoding on top of the `~0`/`~1` escaping: 405 - 406 - ```sh 407 - $ jsonpp uri-fragment "/foo" 408 - OK: /foo -> /foo 409 - ``` 410 - 411 - Simple pointers often don't need percent-encoding. 412 - 413 - ```sh 414 - $ jsonpp uri-fragment "/a~1b" 415 - OK: /a~1b -> /a~1b 416 - ``` 417 - 418 - The `~1` escape stays as-is (it's valid in URI fragments). 419 - 420 - ```sh 421 - $ jsonpp uri-fragment "/c%d" 422 - OK: /c%d -> /c%25d 423 - ``` 424 - 425 - The `%` character must be percent-encoded as `%25` in URIs! 426 - 427 - ```sh 428 - $ jsonpp uri-fragment "/ " 429 - OK: / -> /%20 430 - ``` 431 - 432 - Spaces become `%20`. 433 - 434 - Here's the RFC example showing the URI fragment forms: 435 - 436 - | JSON Pointer | URI Fragment | Value | 437 - |-------------|-------------|-------| 438 - | `""` | `#` | whole document | 439 - | `"/foo"` | `#/foo` | `["bar", "baz"]` | 440 - | `"/foo/0"` | `#/foo/0` | `"bar"` | 441 - | `"/"` | `#/` | `0` | 442 - | `"/a~1b"` | `#/a~1b` | `1` | 443 - | `"/c%d"` | `#/c%25d` | `2` | 444 - | `"/ "` | `#/%20` | `7` | 445 - | `"/m~0n"` | `#/m~0n` | `8` | 446 - 447 351 ## Mutation Operations 448 352 449 353 While RFC 6901 defines JSON Pointer for read-only access, RFC 6902 ··· 459 363 {"foo":"bar","baz":"qux"} 460 364 ``` 461 365 366 + In OCaml: 367 + 368 + ```ocaml 369 + val add : t -> Jsont.json -> value:Jsont.json -> Jsont.json 370 + ``` 371 + 462 372 For arrays, `add` inserts BEFORE the specified index: 463 373 464 374 ```sh ··· 538 448 false 539 449 ``` 540 450 451 + ## Escaping Special Characters 452 + 453 + RFC 6901, Section 3 explains the escaping rules: 454 + 455 + > Because the characters '\~' (%x7E) and '/' (%x2F) have special meanings 456 + > in JSON Pointer, '\~' needs to be encoded as '\~0' and '/' needs to be 457 + > encoded as '\~1' when these characters appear in a reference token. 458 + 459 + Why these specific characters? 460 + - `/` separates tokens, so it must be escaped inside a token 461 + - `~` is the escape character itself, so it must also be escaped 462 + 463 + The escape sequences are: 464 + - `~0` represents `~` (tilde) 465 + - `~1` represents `/` (forward slash) 466 + 467 + ### The Library Handles Escaping Automatically 468 + 469 + **Important**: When using `jsont-pointer` programmatically, you rarely need 470 + to think about escaping. The `Index.Mem` variant stores unescaped strings, 471 + and escaping happens automatically during serialization: 472 + 473 + ```ocaml 474 + (* Create a pointer to key "a/b" - no escaping needed *) 475 + let p = Jsont_pointer.make [Mem "a/b"] 476 + 477 + (* Serialize to string - escaping happens automatically *) 478 + let s = Jsont_pointer.to_string p (* Returns "/a~1b" *) 479 + 480 + (* Parse from string - unescaping happens automatically *) 481 + let p' = Jsont_pointer.of_string "/a~1b" 482 + (* p' contains [Mem "a/b"] - the unescaped key *) 483 + ``` 484 + 485 + The `Token` module exposes the escaping functions if you need them: 486 + 487 + ```ocaml 488 + module Token : sig 489 + val escape : string -> string (* "a/b" -> "a~1b" *) 490 + val unescape : string -> string (* "a~1b" -> "a/b" *) 491 + end 492 + ``` 493 + 494 + ### Escaping in Action 495 + 496 + Let's see escaping with the CLI tool: 497 + 498 + ```sh 499 + $ jsonpp escape "hello" 500 + hello 501 + ``` 502 + 503 + No special characters, no escaping needed. 504 + 505 + ```sh 506 + $ jsonpp escape "a/b" 507 + a~1b 508 + ``` 509 + 510 + The `/` becomes `~1`. 511 + 512 + ```sh 513 + $ jsonpp escape "a~b" 514 + a~0b 515 + ``` 516 + 517 + The `~` becomes `~0`. 518 + 519 + ```sh 520 + $ jsonpp escape "~/" 521 + ~0~1 522 + ``` 523 + 524 + Both characters are escaped. 525 + 526 + ### Unescaping 527 + 528 + And the reverse process: 529 + 530 + ```sh 531 + $ jsonpp unescape "a~1b" 532 + OK: a/b 533 + ``` 534 + 535 + ```sh 536 + $ jsonpp unescape "a~0b" 537 + OK: a~b 538 + ``` 539 + 540 + ### The Order Matters! 541 + 542 + RFC 6901, Section 4 is careful to specify the unescaping order: 543 + 544 + > Evaluation of each reference token begins by decoding any escaped 545 + > character sequence. This is performed by first transforming any 546 + > occurrence of the sequence '~1' to '/', and then transforming any 547 + > occurrence of the sequence '~0' to '~'. By performing the substitutions 548 + > in this order, an implementation avoids the error of turning '~01' first 549 + > into '~1' and then into '/', which would be incorrect (the string '~01' 550 + > correctly becomes '~1' after transformation). 551 + 552 + Let's verify this tricky case: 553 + 554 + ```sh 555 + $ jsonpp unescape "~01" 556 + OK: ~1 557 + ``` 558 + 559 + If we unescaped `~0` first, `~01` would become `~1`, which would then become 560 + `/`. But that's wrong! The sequence `~01` should become the literal string 561 + `~1` (a tilde followed by the digit one). 562 + 563 + Invalid escape sequences are rejected: 564 + 565 + ```sh 566 + $ jsonpp unescape "~2" 567 + ERROR: Invalid JSON Pointer: invalid escape sequence ~2 568 + ``` 569 + 570 + ```sh 571 + $ jsonpp unescape "hello~" 572 + ERROR: Invalid JSON Pointer: incomplete escape sequence at end 573 + ``` 574 + 575 + ## URI Fragment Encoding 576 + 577 + JSON Pointers can be embedded in URIs. RFC 6901, Section 6 explains: 578 + 579 + > A JSON Pointer can be represented in a URI fragment identifier by 580 + > encoding it into octets using UTF-8, while percent-encoding those 581 + > characters not allowed by the fragment rule in RFC 3986. 582 + 583 + This adds percent-encoding on top of the `~0`/`~1` escaping: 584 + 585 + ```sh 586 + $ jsonpp uri-fragment "/foo" 587 + OK: /foo -> /foo 588 + ``` 589 + 590 + Simple pointers often don't need percent-encoding. 591 + 592 + ```sh 593 + $ jsonpp uri-fragment "/a~1b" 594 + OK: /a~1b -> /a~1b 595 + ``` 596 + 597 + The `~1` escape stays as-is (it's valid in URI fragments). 598 + 599 + ```sh 600 + $ jsonpp uri-fragment "/c%d" 601 + OK: /c%d -> /c%25d 602 + ``` 603 + 604 + The `%` character must be percent-encoded as `%25` in URIs! 605 + 606 + ```sh 607 + $ jsonpp uri-fragment "/ " 608 + OK: / -> /%20 609 + ``` 610 + 611 + Spaces become `%20`. 612 + 613 + The library provides functions for URI fragment encoding: 614 + 615 + ```ocaml 616 + val to_uri_fragment : t -> string 617 + val of_uri_fragment : string -> t 618 + val jsont_uri_fragment : t Jsont.t 619 + ``` 620 + 621 + Here's the RFC example showing the URI fragment forms: 622 + 623 + | JSON Pointer | URI Fragment | Value | 624 + |-------------|-------------|-------| 625 + | `""` | `#` | whole document | 626 + | `"/foo"` | `#/foo` | `["bar", "baz"]` | 627 + | `"/foo/0"` | `#/foo/0` | `"bar"` | 628 + | `"/"` | `#/` | `0` | 629 + | `"/a~1b"` | `#/a~1b` | `1` | 630 + | `"/c%d"` | `#/c%25d` | `2` | 631 + | `"/ "` | `#/%20` | `7` | 632 + | `"/m~0n"` | `#/m~0n` | `8` | 633 + 541 634 ## Deeply Nested Structures 542 635 543 636 JSON Pointer handles arbitrarily deep nesting: ··· 561 654 {"arr":[[1,99,2],[3,4]]} 562 655 ``` 563 656 657 + ## Jsont Integration 658 + 659 + The library integrates with the `Jsont` codec system for typed access: 660 + 661 + ```ocaml 662 + (* Codec for JSON Pointers as JSON strings *) 663 + val jsont : t Jsont.t 664 + 665 + (* Query combinators *) 666 + val path : ?absent:'a -> t -> 'a Jsont.t -> 'a Jsont.t 667 + val set_path : ?allow_absent:bool -> 'a Jsont.t -> t -> 'a -> Jsont.json Jsont.t 668 + val update_path : ?absent:'a -> t -> 'a Jsont.t -> Jsont.json Jsont.t 669 + val delete_path : ?allow_absent:bool -> t -> Jsont.json Jsont.t 670 + ``` 671 + 672 + These allow you to use JSON Pointers with typed codecs rather than raw 673 + `Jsont.json` values. 674 + 564 675 ## Summary 565 676 566 677 JSON Pointer (RFC 6901) provides a simple but powerful way to address 567 678 values within JSON documents: 568 679 569 680 1. **Syntax**: Pointers are strings of `/`-separated reference tokens 570 - 2. **Escaping**: Use `~0` for `~` and `~1` for `/` in tokens 681 + 2. **Escaping**: Use `~0` for `~` and `~1` for `/` in tokens (handled automatically by the library) 571 682 3. **Evaluation**: Tokens navigate through objects (by key) and arrays (by index) 572 683 4. **URI Encoding**: Pointers can be percent-encoded for use in URIs 573 684 5. **Mutations**: Combined with JSON Patch (RFC 6902), pointers enable structured updates