Markdown parser fork with extended syntax for personal use.
at hack 1038 lines 32 kB view raw
1//! GFM: table occurs in the [flow][] content type. 2//! 3//! ## Grammar 4//! 5//! Tables form with the following BNF 6//! (<small>see [construct][crate::construct] for character groups</small>): 7//! 8//! ```bnf 9//! gfm_table ::= gfm_table_head 0*(eol gfm_table_body_row) 10//! 11//! ; Restriction: both rows must have the same number of cells. 12//! gfm_table_head ::= gfm_table_row eol gfm_table_delimiter_row 13//! 14//! gfm_table_row ::= ['|'] gfm_table_cell 0*('|' gfm_table_cell) ['|'] *space_or_tab 15//! gfm_table_cell ::= *space_or_tab gfm_table_text *space_or_tab 16//! gfm_table_text ::= 0*(line - '\\' - '|' | '\\' ['\\' | '|']) 17// 18//! gfm_table_delimiter_row ::= ['|'] gfm_table_delimiter_cell 0*('|' gfm_table_delimiter_cell) ['|'] *space_or_tab 19//! gfm_table_delimiter_cell ::= *space_or_tab gfm_table_delimiter_value *space_or_tab 20//! gfm_table_delimiter_value ::= [':'] 1*'-' [':'] 21//! ``` 22//! 23//! As this construct occurs in flow, like all flow constructs, it must be 24//! followed by an eol (line ending) or eof (end of file). 25//! 26//! The above grammar shows that basically anything can be a cell or a row. 27//! The main thing that makes something a row, is that it occurs directly before 28//! or after a delimiter row, or after another row. 29//! 30//! It is not required for a table to have a body: it can end right after the 31//! delimiter row. 32//! 33//! Each column can be marked with an alignment. 34//! The alignment marker is a colon (`:`) used before and/or after delimiter row 35//! filler. 36//! To illustrate: 37//! 38//! ```markdown 39//! | none | left | right | center | 40//! | ---- | :--- | ----: | :----: | 41//! ``` 42//! 43//! The number of cells in the delimiter row, is the number of columns of the 44//! table. 45//! Only the head row is required to have the same number of cells. 46//! Body rows are not required to have a certain number of cells. 47//! For body rows that have less cells than the number of columns of the table, 48//! empty cells are injected. 49//! When a row has more cells than the number of columns of the table, the 50//! superfluous cells are dropped. 51//! To illustrate: 52//! 53//! ```markdown 54//! | a | b | 55//! | - | - | 56//! | c | 57//! | d | e | f | 58//! ``` 59//! 60//! Yields: 61//! 62//! ```html 63//! <table> 64//! <thead> 65//! <tr> 66//! <th>a</th> 67//! <th>b</th> 68//! </tr> 69//! </thead> 70//! <tbody> 71//! <tr> 72//! <td>c</td> 73//! <td></td> 74//! </tr> 75//! <tr> 76//! <td>d</td> 77//! <td>e</td> 78//! </tr> 79//! </tbody> 80//! </table> 81//! ``` 82//! 83//! Each cell’s text is interpreted as the [text][] content type. 84//! That means that it can include constructs such as [attention][attention]. 85//! 86//! The grammar for cells prohibits the use of `|` in them. 87//! To use pipes in cells, encode them as a character reference or character 88//! escape: `&vert;` (or `&VerticalLine;`, `&verbar;`, `&#124;`, `&#x7c;`) or 89//! `\|`. 90//! 91//! Escapes will typically work, but they are not supported in 92//! [code (text)][raw_text] (and the math (text) extension). 93//! To work around this, GitHub came up with a rather weird “trick”. 94//! When inside a table cell *and* inside code, escaped pipes *are* decoded. 95//! To illustrate: 96//! 97//! ```markdown 98//! | Name | Character | 99//! | - | - | 100//! | Left curly brace | `{` | 101//! | Pipe | `\|` | 102//! | Right curly brace | `}` | 103//! ``` 104//! 105//! Yields: 106//! 107//! ```html 108//! <table> 109//! <thead> 110//! <tr> 111//! <th>Name</th> 112//! <th>Character</th> 113//! </tr> 114//! </thead> 115//! <tbody> 116//! <tr> 117//! <td>Left curly brace</td> 118//! <td><code>{</code></td> 119//! </tr> 120//! <tr> 121//! <td>Pipe</td> 122//! <td><code>|</code></td> 123//! </tr> 124//! <tr> 125//! <td>Right curly brace</td> 126//! <td><code>}</code></td> 127//! </tr> 128//! </tbody> 129//! </table> 130//! ``` 131//! 132//! > 👉 **Note**: no other character can be escaped like this. 133//! > Escaping pipes in code does not work when not inside a table, either. 134//! 135//! ## HTML 136//! 137//! GFM tables relate to several HTML elements: `<table>`, `<tbody>`, `<td>`, 138//! `<th>`, `<thead>`, and `<tr>`. 139//! See 140//! [*§ 4.9.1 The `table` element*][html_table], 141//! [*§ 4.9.5 The `tbody` element*][html_tbody], 142//! [*§ 4.9.9 The `td` element*][html_td], 143//! [*§ 4.9.10 The `th` element*][html_th], 144//! [*§ 4.9.6 The `thead` element*][html_thead], and 145//! [*§ 4.9.8 The `tr` element*][html_tr] 146//! in the HTML spec for more info. 147//! 148//! If the alignment of a column is left, right, or center, a deprecated 149//! `align` attribute is added to each `<th>` and `<td>` element belonging to 150//! that column. 151//! That attribute is interpreted by browsers as if a CSS `text-align` property 152//! was included, with its value set to that same keyword. 153//! 154//! ## Recommendation 155//! 156//! When authoring markdown with GFM tables, it’s recommended to *always* put 157//! pipes around cells. 158//! Without them, it can be hard to infer whether the table will work, how many 159//! columns there are, and which column you are currently editing. 160//! 161//! It is recommended to not use many columns, as it results in very long lines, 162//! making it hard to infer which column you are currently editing. 163//! 164//! For larger tables, particularly when cells vary in size, it is recommended 165//! *not* to manually “pad” cell text. 166//! While it can look better, it results in a lot of time spent realigning 167//! everything when a new, longer cell is added or the longest cell removed, as 168//! every row then must be changed. 169//! Other than costing time, it also causes large diffs in Git. 170//! 171//! To illustrate, when authoring large tables, it is discouraged to pad cells 172//! like this: 173//! 174//! ```markdown 175//! | Alpha bravo charlie | delta | 176//! | ------------------- | -----------------: | 177//! | Echo | Foxtrot golf hotel | 178//! ``` 179//! 180//! Instead, use single spaces (and single filler dashes): 181//! 182//! ```markdown 183//! | Alpha bravo charlie | delta | 184//! | - | -: | 185//! | Echo | Foxtrot golf hotel | 186//! ``` 187//! 188//! ## Bugs 189//! 190//! GitHub’s own algorithm to parse tables contains a bug. 191//! This bug is not present in this project. 192//! The issue relating to tables is: 193//! 194//! * [GFM tables: escaped escapes are incorrectly treated as escapes](https://github.com/github/cmark-gfm/issues/277) 195//! 196//! ## Tokens 197//! 198//! * [`GfmTable`][Name::GfmTable] 199//! * [`GfmTableBody`][Name::GfmTableBody] 200//! * [`GfmTableCell`][Name::GfmTableCell] 201//! * [`GfmTableCellDivider`][Name::GfmTableCellDivider] 202//! * [`GfmTableCellText`][Name::GfmTableCellText] 203//! * [`GfmTableDelimiterCell`][Name::GfmTableDelimiterCell] 204//! * [`GfmTableDelimiterCellValue`][Name::GfmTableDelimiterCellValue] 205//! * [`GfmTableDelimiterFiller`][Name::GfmTableDelimiterFiller] 206//! * [`GfmTableDelimiterMarker`][Name::GfmTableDelimiterMarker] 207//! * [`GfmTableDelimiterRow`][Name::GfmTableDelimiterRow] 208//! * [`GfmTableHead`][Name::GfmTableHead] 209//! * [`GfmTableRow`][Name::GfmTableRow] 210//! * [`LineEnding`][Name::LineEnding] 211//! 212//! ## References 213//! 214//! * [`micromark-extension-gfm-table`](https://github.com/micromark/micromark-extension-gfm-table) 215//! * [*§ 4.10 Tables (extension)* in `GFM`](https://github.github.com/gfm/#tables-extension-) 216//! 217//! [flow]: crate::construct::flow 218//! [text]: crate::construct::text 219//! [attention]: crate::construct::attention 220//! [raw_text]: crate::construct::raw_text 221//! [html_table]: https://html.spec.whatwg.org/multipage/tables.html#the-table-element 222//! [html_tbody]: https://html.spec.whatwg.org/multipage/tables.html#the-tbody-element 223//! [html_td]: https://html.spec.whatwg.org/multipage/tables.html#the-td-element 224//! [html_th]: https://html.spec.whatwg.org/multipage/tables.html#the-th-element 225//! [html_thead]: https://html.spec.whatwg.org/multipage/tables.html#the-thead-element 226//! [html_tr]: https://html.spec.whatwg.org/multipage/tables.html#the-tr-element 227 228use crate::construct::partial_space_or_tab::{space_or_tab, space_or_tab_min_max}; 229use crate::event::{Content, Event, Kind, Link, Name}; 230use crate::resolve::Name as ResolveName; 231use crate::state::{Name as StateName, State}; 232use crate::subtokenize::Subresult; 233use crate::tokenizer::Tokenizer; 234use crate::util::{constant::TAB_SIZE, skip::opt_back as skip_opt_back}; 235use alloc::vec; 236 237/// Start of a GFM table. 238/// 239/// If there is a valid table row or table head before, then we try to parse 240/// another row. 241/// Otherwise, we try to parse a head. 242/// 243/// ```markdown 244/// > | | a | 245/// ^ 246/// | | - | 247/// > | | b | 248/// ^ 249/// ``` 250pub fn start(tokenizer: &mut Tokenizer) -> State { 251 if tokenizer.parse_state.options.constructs.gfm_table { 252 if !tokenizer.pierce 253 && !tokenizer.events.is_empty() 254 && matches!( 255 tokenizer.events[skip_opt_back( 256 &tokenizer.events, 257 tokenizer.events.len() - 1, 258 &[Name::LineEnding, Name::SpaceOrTab], 259 )] 260 .name, 261 Name::GfmTableHead | Name::GfmTableRow 262 ) 263 { 264 State::Retry(StateName::GfmTableBodyRowStart) 265 } else { 266 State::Retry(StateName::GfmTableHeadRowBefore) 267 } 268 } else { 269 State::Nok 270 } 271} 272 273/// Before table head row. 274/// 275/// ```markdown 276/// > | | a | 277/// ^ 278/// | | - | 279/// | | b | 280/// ``` 281pub fn head_row_before(tokenizer: &mut Tokenizer) -> State { 282 tokenizer.enter(Name::GfmTableHead); 283 tokenizer.enter(Name::GfmTableRow); 284 if matches!(tokenizer.current, Some(b'\t' | b' ')) { 285 tokenizer.attempt(State::Next(StateName::GfmTableHeadRowStart), State::Nok); 286 State::Retry(space_or_tab_min_max( 287 tokenizer, 288 0, 289 if tokenizer.parse_state.options.constructs.code_indented { 290 TAB_SIZE - 1 291 } else { 292 usize::MAX 293 }, 294 )) 295 } else { 296 State::Retry(StateName::GfmTableHeadRowStart) 297 } 298} 299 300/// Before table head row, after whitespace. 301/// 302/// ```markdown 303/// > | | a | 304/// ^ 305/// | | - | 306/// | | b | 307/// ``` 308pub fn head_row_start(tokenizer: &mut Tokenizer) -> State { 309 match tokenizer.current { 310 // 4+ spaces. 311 Some(b'\t' | b' ') => State::Nok, 312 Some(b'|') => State::Retry(StateName::GfmTableHeadRowBreak), 313 _ => { 314 tokenizer.tokenize_state.seen = true; 315 // Count the first character, that isn’t a pipe, double. 316 tokenizer.tokenize_state.size_b += 1; 317 State::Retry(StateName::GfmTableHeadRowBreak) 318 } 319 } 320} 321 322/// At break in table head row. 323/// 324/// ```markdown 325/// > | | a | 326/// ^ 327/// ^ 328/// ^ 329/// | | - | 330/// | | b | 331/// ``` 332pub fn head_row_break(tokenizer: &mut Tokenizer) -> State { 333 match tokenizer.current { 334 None => { 335 tokenizer.tokenize_state.seen = false; 336 tokenizer.tokenize_state.size = 0; 337 tokenizer.tokenize_state.size_b = 0; 338 State::Nok 339 } 340 Some(b'\n') => { 341 // If anything other than one pipe (ignoring whitespace) was used, it’s fine. 342 if tokenizer.tokenize_state.size_b > 1 { 343 tokenizer.tokenize_state.size_b = 0; 344 // Feel free to interrupt: 345 tokenizer.interrupt = true; 346 tokenizer.exit(Name::GfmTableRow); 347 tokenizer.enter(Name::LineEnding); 348 tokenizer.consume(); 349 tokenizer.exit(Name::LineEnding); 350 State::Next(StateName::GfmTableHeadDelimiterStart) 351 } else { 352 tokenizer.tokenize_state.seen = false; 353 tokenizer.tokenize_state.size = 0; 354 tokenizer.tokenize_state.size_b = 0; 355 State::Nok 356 } 357 } 358 Some(b'\t' | b' ') => { 359 tokenizer.attempt(State::Next(StateName::GfmTableHeadRowBreak), State::Nok); 360 State::Retry(space_or_tab(tokenizer)) 361 } 362 _ => { 363 tokenizer.tokenize_state.size_b += 1; 364 365 // Whether a delimiter was seen. 366 if tokenizer.tokenize_state.seen { 367 tokenizer.tokenize_state.seen = false; 368 // Header cell count. 369 tokenizer.tokenize_state.size += 1; 370 } 371 372 if tokenizer.current == Some(b'|') { 373 tokenizer.enter(Name::GfmTableCellDivider); 374 tokenizer.consume(); 375 tokenizer.exit(Name::GfmTableCellDivider); 376 // Whether a delimiter was seen. 377 tokenizer.tokenize_state.seen = true; 378 State::Next(StateName::GfmTableHeadRowBreak) 379 } else { 380 // Anything else is cell data. 381 tokenizer.enter(Name::Data); 382 State::Retry(StateName::GfmTableHeadRowData) 383 } 384 } 385 } 386} 387 388/// In table head row data. 389/// 390/// ```markdown 391/// > | | a | 392/// ^ 393/// | | - | 394/// | | b | 395/// ``` 396pub fn head_row_data(tokenizer: &mut Tokenizer) -> State { 397 match tokenizer.current { 398 None | Some(b'\t' | b'\n' | b' ' | b'|') => { 399 tokenizer.exit(Name::Data); 400 State::Retry(StateName::GfmTableHeadRowBreak) 401 } 402 _ => { 403 let name = if tokenizer.current == Some(b'\\') { 404 StateName::GfmTableHeadRowEscape 405 } else { 406 StateName::GfmTableHeadRowData 407 }; 408 tokenizer.consume(); 409 State::Next(name) 410 } 411 } 412} 413 414/// In table head row escape. 415/// 416/// ```markdown 417/// > | | a\-b | 418/// ^ 419/// | | ---- | 420/// | | c | 421/// ``` 422pub fn head_row_escape(tokenizer: &mut Tokenizer) -> State { 423 match tokenizer.current { 424 Some(b'\\' | b'|') => { 425 tokenizer.consume(); 426 State::Next(StateName::GfmTableHeadRowData) 427 } 428 _ => State::Retry(StateName::GfmTableHeadRowData), 429 } 430} 431 432/// Before delimiter row. 433/// 434/// ```markdown 435/// | | a | 436/// > | | - | 437/// ^ 438/// | | b | 439/// ``` 440pub fn head_delimiter_start(tokenizer: &mut Tokenizer) -> State { 441 // Reset `interrupt`. 442 tokenizer.interrupt = false; 443 444 if tokenizer.lazy || tokenizer.pierce { 445 tokenizer.tokenize_state.size = 0; 446 State::Nok 447 } else { 448 tokenizer.enter(Name::GfmTableDelimiterRow); 449 // Track if we’ve seen a `:` or `|`. 450 tokenizer.tokenize_state.seen = false; 451 452 match tokenizer.current { 453 Some(b'\t' | b' ') => { 454 tokenizer.attempt( 455 State::Next(StateName::GfmTableHeadDelimiterBefore), 456 State::Next(StateName::GfmTableHeadDelimiterNok), 457 ); 458 459 State::Retry(space_or_tab_min_max( 460 tokenizer, 461 0, 462 if tokenizer.parse_state.options.constructs.code_indented { 463 TAB_SIZE - 1 464 } else { 465 usize::MAX 466 }, 467 )) 468 } 469 _ => State::Retry(StateName::GfmTableHeadDelimiterBefore), 470 } 471 } 472} 473 474/// Before delimiter row, after optional whitespace. 475/// 476/// Reused when a `|` is found later, to parse another cell. 477/// 478/// ```markdown 479/// | | a | 480/// > | | - | 481/// ^ 482/// | | b | 483/// ``` 484pub fn head_delimiter_before(tokenizer: &mut Tokenizer) -> State { 485 match tokenizer.current { 486 Some(b'-' | b':') => State::Retry(StateName::GfmTableHeadDelimiterValueBefore), 487 Some(b'|') => { 488 tokenizer.tokenize_state.seen = true; 489 // If we start with a pipe, we open a cell marker. 490 tokenizer.enter(Name::GfmTableCellDivider); 491 tokenizer.consume(); 492 tokenizer.exit(Name::GfmTableCellDivider); 493 State::Next(StateName::GfmTableHeadDelimiterCellBefore) 494 } 495 // More whitespace / empty row not allowed at start. 496 _ => State::Retry(StateName::GfmTableHeadDelimiterNok), 497 } 498} 499 500/// After `|`, before delimiter cell. 501/// 502/// ```markdown 503/// | | a | 504/// > | | - | 505/// ^ 506/// ``` 507pub fn head_delimiter_cell_before(tokenizer: &mut Tokenizer) -> State { 508 match tokenizer.current { 509 Some(b'\t' | b' ') => { 510 tokenizer.attempt( 511 State::Next(StateName::GfmTableHeadDelimiterValueBefore), 512 State::Nok, 513 ); 514 State::Retry(space_or_tab(tokenizer)) 515 } 516 _ => State::Retry(StateName::GfmTableHeadDelimiterValueBefore), 517 } 518} 519 520/// Before delimiter cell value. 521/// 522/// ```markdown 523/// | | a | 524/// > | | - | 525/// ^ 526/// ``` 527pub fn head_delimiter_value_before(tokenizer: &mut Tokenizer) -> State { 528 match tokenizer.current { 529 None | Some(b'\n') => State::Retry(StateName::GfmTableHeadDelimiterCellAfter), 530 Some(b':') => { 531 // Align: left. 532 tokenizer.tokenize_state.size_b += 1; 533 tokenizer.tokenize_state.seen = true; 534 tokenizer.enter(Name::GfmTableDelimiterMarker); 535 tokenizer.consume(); 536 tokenizer.exit(Name::GfmTableDelimiterMarker); 537 State::Next(StateName::GfmTableHeadDelimiterLeftAlignmentAfter) 538 } 539 Some(b'-') => { 540 // Align: none. 541 tokenizer.tokenize_state.size_b += 1; 542 State::Retry(StateName::GfmTableHeadDelimiterLeftAlignmentAfter) 543 } 544 _ => State::Retry(StateName::GfmTableHeadDelimiterNok), 545 } 546} 547 548/// After delimiter cell left alignment marker. 549/// 550/// ```markdown 551/// | | a | 552/// > | | :- | 553/// ^ 554/// ``` 555pub fn head_delimiter_left_alignment_after(tokenizer: &mut Tokenizer) -> State { 556 match tokenizer.current { 557 Some(b'-') => { 558 tokenizer.enter(Name::GfmTableDelimiterFiller); 559 State::Retry(StateName::GfmTableHeadDelimiterFiller) 560 } 561 // Anything else is not ok after the left-align colon. 562 _ => State::Retry(StateName::GfmTableHeadDelimiterNok), 563 } 564} 565 566/// In delimiter cell filler. 567/// 568/// ```markdown 569/// | | a | 570/// > | | - | 571/// ^ 572/// ``` 573pub fn head_delimiter_filler(tokenizer: &mut Tokenizer) -> State { 574 match tokenizer.current { 575 Some(b'-') => { 576 tokenizer.consume(); 577 State::Next(StateName::GfmTableHeadDelimiterFiller) 578 } 579 Some(b':') => { 580 // Align is `center` if it was `left`, `right` otherwise. 581 tokenizer.tokenize_state.seen = true; 582 tokenizer.exit(Name::GfmTableDelimiterFiller); 583 tokenizer.enter(Name::GfmTableDelimiterMarker); 584 tokenizer.consume(); 585 tokenizer.exit(Name::GfmTableDelimiterMarker); 586 State::Next(StateName::GfmTableHeadDelimiterRightAlignmentAfter) 587 } 588 _ => { 589 tokenizer.exit(Name::GfmTableDelimiterFiller); 590 State::Retry(StateName::GfmTableHeadDelimiterRightAlignmentAfter) 591 } 592 } 593} 594 595/// After delimiter cell right alignment marker. 596/// 597/// ```markdown 598/// | | a | 599/// > | | -: | 600/// ^ 601/// ``` 602pub fn head_delimiter_right_alignment_after(tokenizer: &mut Tokenizer) -> State { 603 match tokenizer.current { 604 Some(b'\t' | b' ') => { 605 tokenizer.attempt( 606 State::Next(StateName::GfmTableHeadDelimiterCellAfter), 607 State::Nok, 608 ); 609 State::Retry(space_or_tab(tokenizer)) 610 } 611 _ => State::Retry(StateName::GfmTableHeadDelimiterCellAfter), 612 } 613} 614 615/// After delimiter cell. 616/// 617/// ```markdown 618/// | | a | 619/// > | | -: | 620/// ^ 621/// ``` 622pub fn head_delimiter_cell_after(tokenizer: &mut Tokenizer) -> State { 623 match tokenizer.current { 624 None | Some(b'\n') => { 625 // Exit when: 626 // * there was no `:` or `|` at all (it’s a thematic break or setext 627 // underline instead) 628 // * the header cell count is not the delimiter cell count 629 if !tokenizer.tokenize_state.seen 630 || tokenizer.tokenize_state.size != tokenizer.tokenize_state.size_b 631 { 632 State::Retry(StateName::GfmTableHeadDelimiterNok) 633 } else { 634 // Reset. 635 tokenizer.tokenize_state.seen = false; 636 tokenizer.tokenize_state.size = 0; 637 tokenizer.tokenize_state.size_b = 0; 638 tokenizer.exit(Name::GfmTableDelimiterRow); 639 tokenizer.exit(Name::GfmTableHead); 640 tokenizer.register_resolver(ResolveName::GfmTable); 641 State::Ok 642 } 643 } 644 Some(b'|') => State::Retry(StateName::GfmTableHeadDelimiterBefore), 645 _ => State::Retry(StateName::GfmTableHeadDelimiterNok), 646 } 647} 648 649/// In delimiter row, at a disallowed byte. 650/// 651/// ```markdown 652/// | | a | 653/// > | | x | 654/// ^ 655/// ``` 656pub fn head_delimiter_nok(tokenizer: &mut Tokenizer) -> State { 657 // Reset. 658 tokenizer.tokenize_state.seen = false; 659 tokenizer.tokenize_state.size = 0; 660 tokenizer.tokenize_state.size_b = 0; 661 State::Nok 662} 663 664/// Before table body row. 665/// 666/// ```markdown 667/// | | a | 668/// | | - | 669/// > | | b | 670/// ^ 671/// ``` 672pub fn body_row_start(tokenizer: &mut Tokenizer) -> State { 673 if tokenizer.lazy { 674 State::Nok 675 } else { 676 tokenizer.enter(Name::GfmTableRow); 677 678 match tokenizer.current { 679 Some(b'\t' | b' ') => { 680 tokenizer.attempt(State::Next(StateName::GfmTableBodyRowBreak), State::Nok); 681 // We’re parsing a body row. 682 // If we’re here, we already attempted blank lines and indented 683 // code. 684 // So parse as much whitespace as needed: 685 State::Retry(space_or_tab_min_max(tokenizer, 0, usize::MAX)) 686 } 687 _ => State::Retry(StateName::GfmTableBodyRowBreak), 688 } 689 } 690} 691 692/// At break in table body row. 693/// 694/// ```markdown 695/// | | a | 696/// | | - | 697/// > | | b | 698/// ^ 699/// ^ 700/// ^ 701/// ``` 702pub fn body_row_break(tokenizer: &mut Tokenizer) -> State { 703 match tokenizer.current { 704 None | Some(b'\n') => { 705 tokenizer.exit(Name::GfmTableRow); 706 State::Ok 707 } 708 Some(b'\t' | b' ') => { 709 tokenizer.attempt(State::Next(StateName::GfmTableBodyRowBreak), State::Nok); 710 State::Retry(space_or_tab(tokenizer)) 711 } 712 Some(b'|') => { 713 tokenizer.enter(Name::GfmTableCellDivider); 714 tokenizer.consume(); 715 tokenizer.exit(Name::GfmTableCellDivider); 716 State::Next(StateName::GfmTableBodyRowBreak) 717 } 718 // Anything else is cell content. 719 _ => { 720 tokenizer.enter(Name::Data); 721 State::Retry(StateName::GfmTableBodyRowData) 722 } 723 } 724} 725 726/// In table body row data. 727/// 728/// ```markdown 729/// | | a | 730/// | | - | 731/// > | | b | 732/// ^ 733/// ``` 734pub fn body_row_data(tokenizer: &mut Tokenizer) -> State { 735 match tokenizer.current { 736 None | Some(b'\t' | b'\n' | b' ' | b'|') => { 737 tokenizer.exit(Name::Data); 738 State::Retry(StateName::GfmTableBodyRowBreak) 739 } 740 _ => { 741 let name = if tokenizer.current == Some(b'\\') { 742 StateName::GfmTableBodyRowEscape 743 } else { 744 StateName::GfmTableBodyRowData 745 }; 746 tokenizer.consume(); 747 State::Next(name) 748 } 749 } 750} 751 752/// In table body row escape. 753/// 754/// ```markdown 755/// | | a | 756/// | | ---- | 757/// > | | b\-c | 758/// ^ 759/// ``` 760pub fn body_row_escape(tokenizer: &mut Tokenizer) -> State { 761 match tokenizer.current { 762 Some(b'\\' | b'|') => { 763 tokenizer.consume(); 764 State::Next(StateName::GfmTableBodyRowData) 765 } 766 _ => State::Retry(StateName::GfmTableBodyRowData), 767 } 768} 769 770/// Resolve GFM table. 771pub fn resolve(tokenizer: &mut Tokenizer) -> Option<Subresult> { 772 let mut index = 0; 773 let mut in_first_cell_awaiting_pipe = true; 774 let mut in_row = false; 775 let mut in_delimiter_row = false; 776 let mut last_cell = (0, 0, 0, 0); 777 let mut cell = (0, 0, 0, 0); 778 let mut after_head_awaiting_first_body_row = false; 779 let mut last_table_end = 0; 780 let mut last_table_has_body = false; 781 782 while index < tokenizer.events.len() { 783 let event = &tokenizer.events[index]; 784 785 if event.kind == Kind::Enter { 786 // Start of head. 787 if event.name == Name::GfmTableHead { 788 after_head_awaiting_first_body_row = false; 789 790 // Inject previous (body end and) table end. 791 if last_table_end != 0 { 792 flush_table_end(tokenizer, last_table_end, last_table_has_body); 793 last_table_has_body = false; 794 last_table_end = 0; 795 } 796 797 // Inject table start. 798 let enter = Event { 799 kind: Kind::Enter, 800 name: Name::GfmTable, 801 point: tokenizer.events[index].point.clone(), 802 link: None, 803 }; 804 tokenizer.map.add(index, 0, vec![enter]); 805 } else if matches!(event.name, Name::GfmTableRow | Name::GfmTableDelimiterRow) { 806 in_delimiter_row = event.name == Name::GfmTableDelimiterRow; 807 in_row = true; 808 in_first_cell_awaiting_pipe = true; 809 last_cell = (0, 0, 0, 0); 810 cell = (0, index + 1, 0, 0); 811 812 // Inject table body start. 813 if after_head_awaiting_first_body_row { 814 after_head_awaiting_first_body_row = false; 815 last_table_has_body = true; 816 let enter = Event { 817 kind: Kind::Enter, 818 name: Name::GfmTableBody, 819 point: tokenizer.events[index].point.clone(), 820 link: None, 821 }; 822 tokenizer.map.add(index, 0, vec![enter]); 823 } 824 } 825 // Cell data. 826 else if in_row 827 && matches!( 828 event.name, 829 Name::Data | Name::GfmTableDelimiterMarker | Name::GfmTableDelimiterFiller 830 ) 831 { 832 in_first_cell_awaiting_pipe = false; 833 834 // First value in cell. 835 if cell.2 == 0 { 836 if last_cell.1 != 0 { 837 cell.0 = cell.1; 838 flush_cell(tokenizer, last_cell, in_delimiter_row, None); 839 last_cell = (0, 0, 0, 0); 840 } 841 842 cell.2 = index; 843 } 844 } else if event.name == Name::GfmTableCellDivider { 845 if in_first_cell_awaiting_pipe { 846 in_first_cell_awaiting_pipe = false; 847 } else { 848 if last_cell.1 != 0 { 849 cell.0 = cell.1; 850 flush_cell(tokenizer, last_cell, in_delimiter_row, None); 851 } 852 853 last_cell = cell; 854 cell = (last_cell.1, index, 0, 0); 855 } 856 } 857 // Exit events. 858 } else if event.name == Name::GfmTableHead { 859 after_head_awaiting_first_body_row = true; 860 last_table_end = index; 861 } else if matches!(event.name, Name::GfmTableRow | Name::GfmTableDelimiterRow) { 862 in_row = false; 863 last_table_end = index; 864 if last_cell.1 != 0 { 865 cell.0 = cell.1; 866 flush_cell(tokenizer, last_cell, in_delimiter_row, Some(index)); 867 } else if cell.1 != 0 { 868 flush_cell(tokenizer, cell, in_delimiter_row, Some(index)); 869 } 870 } else if in_row 871 && (matches!( 872 event.name, 873 Name::Data | Name::GfmTableDelimiterMarker | Name::GfmTableDelimiterFiller 874 )) 875 { 876 cell.3 = index; 877 } 878 879 index += 1; 880 } 881 882 if last_table_end != 0 { 883 flush_table_end(tokenizer, last_table_end, last_table_has_body); 884 } 885 886 tokenizer.map.consume(&mut tokenizer.events); 887 None 888} 889 890/// Generate a cell. 891fn flush_cell( 892 tokenizer: &mut Tokenizer, 893 range: (usize, usize, usize, usize), 894 in_delimiter_row: bool, 895 row_end: Option<usize>, 896) { 897 let group_name = if in_delimiter_row { 898 Name::GfmTableDelimiterCell 899 } else { 900 Name::GfmTableCell 901 }; 902 let value_name = if in_delimiter_row { 903 Name::GfmTableDelimiterCellValue 904 } else { 905 Name::GfmTableCellText 906 }; 907 908 // Insert an exit for the previous cell, if there is one. 909 // 910 // ```markdown 911 // > | | aa | bb | cc | 912 // ^-- exit 913 // ^^^^-- this cell 914 // ``` 915 if range.0 != 0 { 916 tokenizer.map.add( 917 range.0, 918 0, 919 vec![Event { 920 kind: Kind::Exit, 921 name: group_name.clone(), 922 point: tokenizer.events[range.0].point.clone(), 923 link: None, 924 }], 925 ); 926 } 927 928 // Insert enter of this cell. 929 // 930 // ```markdown 931 // > | | aa | bb | cc | 932 // ^-- enter 933 // ^^^^-- this cell 934 // ``` 935 tokenizer.map.add( 936 range.1, 937 0, 938 vec![Event { 939 kind: Kind::Enter, 940 name: group_name.clone(), 941 point: tokenizer.events[range.1].point.clone(), 942 link: None, 943 }], 944 ); 945 946 // Insert text start at first data start and end at last data end, and 947 // remove events between. 948 // 949 // ```markdown 950 // > | | aa | bb | cc | 951 // ^-- enter 952 // ^-- exit 953 // ^^^^-- this cell 954 // ``` 955 if range.2 != 0 { 956 tokenizer.map.add( 957 range.2, 958 0, 959 vec![Event { 960 kind: Kind::Enter, 961 name: value_name.clone(), 962 point: tokenizer.events[range.2].point.clone(), 963 link: None, 964 }], 965 ); 966 debug_assert_ne!(range.3, 0); 967 968 if !in_delimiter_row { 969 tokenizer.events[range.2].link = Some(Link { 970 previous: None, 971 next: None, 972 content: Content::Text, 973 }); 974 975 // To do: positional info of the remaining `data` nodes likely have 976 // to be fixed. 977 if range.3 > range.2 + 1 { 978 let a = range.2 + 1; 979 let b = range.3 - range.2 - 1; 980 tokenizer.map.add(a, b, vec![]); 981 } 982 } 983 984 tokenizer.map.add( 985 range.3 + 1, 986 0, 987 vec![Event { 988 kind: Kind::Exit, 989 name: value_name, 990 point: tokenizer.events[range.3].point.clone(), 991 link: None, 992 }], 993 ); 994 } 995 996 // Insert an exit for the last cell, if at the row end. 997 // 998 // ```markdown 999 // > | | aa | bb | cc | 1000 // ^-- exit 1001 // ^^^^^^-- this cell (the last one contains two “between” parts) 1002 // ``` 1003 if let Some(row_end) = row_end { 1004 tokenizer.map.add( 1005 row_end, 1006 0, 1007 vec![Event { 1008 kind: Kind::Exit, 1009 name: group_name, 1010 point: tokenizer.events[row_end].point.clone(), 1011 link: None, 1012 }], 1013 ); 1014 } 1015} 1016 1017/// Generate table end (and table body end). 1018fn flush_table_end(tokenizer: &mut Tokenizer, index: usize, body: bool) { 1019 let mut exits = vec![]; 1020 1021 if body { 1022 exits.push(Event { 1023 kind: Kind::Exit, 1024 name: Name::GfmTableBody, 1025 point: tokenizer.events[index].point.clone(), 1026 link: None, 1027 }); 1028 } 1029 1030 exits.push(Event { 1031 kind: Kind::Exit, 1032 name: Name::GfmTable, 1033 point: tokenizer.events[index].point.clone(), 1034 link: None, 1035 }); 1036 1037 tokenizer.map.add(index + 1, 0, exits); 1038}