Markdown parser fork with extended syntax for personal use.
1//! GFM: table occurs in the [flow][] content type.
2//!
3//! ## Grammar
4//!
5//! Tables form with the following BNF
6//! (<small>see [construct][crate::construct] for character groups</small>):
7//!
8//! ```bnf
9//! gfm_table ::= gfm_table_head 0*(eol gfm_table_body_row)
10//!
11//! ; Restriction: both rows must have the same number of cells.
12//! gfm_table_head ::= gfm_table_row eol gfm_table_delimiter_row
13//!
14//! gfm_table_row ::= ['|'] gfm_table_cell 0*('|' gfm_table_cell) ['|'] *space_or_tab
15//! gfm_table_cell ::= *space_or_tab gfm_table_text *space_or_tab
16//! gfm_table_text ::= 0*(line - '\\' - '|' | '\\' ['\\' | '|'])
17//
18//! gfm_table_delimiter_row ::= ['|'] gfm_table_delimiter_cell 0*('|' gfm_table_delimiter_cell) ['|'] *space_or_tab
19//! gfm_table_delimiter_cell ::= *space_or_tab gfm_table_delimiter_value *space_or_tab
20//! gfm_table_delimiter_value ::= [':'] 1*'-' [':']
21//! ```
22//!
23//! As this construct occurs in flow, like all flow constructs, it must be
24//! followed by an eol (line ending) or eof (end of file).
25//!
26//! The above grammar shows that basically anything can be a cell or a row.
27//! The main thing that makes something a row, is that it occurs directly before
28//! or after a delimiter row, or after another row.
29//!
30//! It is not required for a table to have a body: it can end right after the
31//! delimiter row.
32//!
33//! Each column can be marked with an alignment.
34//! The alignment marker is a colon (`:`) used before and/or after delimiter row
35//! filler.
36//! To illustrate:
37//!
38//! ```markdown
39//! | none | left | right | center |
40//! | ---- | :--- | ----: | :----: |
41//! ```
42//!
43//! The number of cells in the delimiter row, is the number of columns of the
44//! table.
45//! Only the head row is required to have the same number of cells.
46//! Body rows are not required to have a certain number of cells.
47//! For body rows that have less cells than the number of columns of the table,
48//! empty cells are injected.
49//! When a row has more cells than the number of columns of the table, the
50//! superfluous cells are dropped.
51//! To illustrate:
52//!
53//! ```markdown
54//! | a | b |
55//! | - | - |
56//! | c |
57//! | d | e | f |
58//! ```
59//!
60//! Yields:
61//!
62//! ```html
63//! <table>
64//! <thead>
65//! <tr>
66//! <th>a</th>
67//! <th>b</th>
68//! </tr>
69//! </thead>
70//! <tbody>
71//! <tr>
72//! <td>c</td>
73//! <td></td>
74//! </tr>
75//! <tr>
76//! <td>d</td>
77//! <td>e</td>
78//! </tr>
79//! </tbody>
80//! </table>
81//! ```
82//!
83//! Each cell’s text is interpreted as the [text][] content type.
84//! That means that it can include constructs such as [attention][attention].
85//!
86//! The grammar for cells prohibits the use of `|` in them.
87//! To use pipes in cells, encode them as a character reference or character
88//! escape: `|` (or `|`, `|`, `|`, `|`) or
89//! `\|`.
90//!
91//! Escapes will typically work, but they are not supported in
92//! [code (text)][raw_text] (and the math (text) extension).
93//! To work around this, GitHub came up with a rather weird “trick”.
94//! When inside a table cell *and* inside code, escaped pipes *are* decoded.
95//! To illustrate:
96//!
97//! ```markdown
98//! | Name | Character |
99//! | - | - |
100//! | Left curly brace | `{` |
101//! | Pipe | `\|` |
102//! | Right curly brace | `}` |
103//! ```
104//!
105//! Yields:
106//!
107//! ```html
108//! <table>
109//! <thead>
110//! <tr>
111//! <th>Name</th>
112//! <th>Character</th>
113//! </tr>
114//! </thead>
115//! <tbody>
116//! <tr>
117//! <td>Left curly brace</td>
118//! <td><code>{</code></td>
119//! </tr>
120//! <tr>
121//! <td>Pipe</td>
122//! <td><code>|</code></td>
123//! </tr>
124//! <tr>
125//! <td>Right curly brace</td>
126//! <td><code>}</code></td>
127//! </tr>
128//! </tbody>
129//! </table>
130//! ```
131//!
132//! > 👉 **Note**: no other character can be escaped like this.
133//! > Escaping pipes in code does not work when not inside a table, either.
134//!
135//! ## HTML
136//!
137//! GFM tables relate to several HTML elements: `<table>`, `<tbody>`, `<td>`,
138//! `<th>`, `<thead>`, and `<tr>`.
139//! See
140//! [*§ 4.9.1 The `table` element*][html_table],
141//! [*§ 4.9.5 The `tbody` element*][html_tbody],
142//! [*§ 4.9.9 The `td` element*][html_td],
143//! [*§ 4.9.10 The `th` element*][html_th],
144//! [*§ 4.9.6 The `thead` element*][html_thead], and
145//! [*§ 4.9.8 The `tr` element*][html_tr]
146//! in the HTML spec for more info.
147//!
148//! If the alignment of a column is left, right, or center, a deprecated
149//! `align` attribute is added to each `<th>` and `<td>` element belonging to
150//! that column.
151//! That attribute is interpreted by browsers as if a CSS `text-align` property
152//! was included, with its value set to that same keyword.
153//!
154//! ## Recommendation
155//!
156//! When authoring markdown with GFM tables, it’s recommended to *always* put
157//! pipes around cells.
158//! Without them, it can be hard to infer whether the table will work, how many
159//! columns there are, and which column you are currently editing.
160//!
161//! It is recommended to not use many columns, as it results in very long lines,
162//! making it hard to infer which column you are currently editing.
163//!
164//! For larger tables, particularly when cells vary in size, it is recommended
165//! *not* to manually “pad” cell text.
166//! While it can look better, it results in a lot of time spent realigning
167//! everything when a new, longer cell is added or the longest cell removed, as
168//! every row then must be changed.
169//! Other than costing time, it also causes large diffs in Git.
170//!
171//! To illustrate, when authoring large tables, it is discouraged to pad cells
172//! like this:
173//!
174//! ```markdown
175//! | Alpha bravo charlie | delta |
176//! | ------------------- | -----------------: |
177//! | Echo | Foxtrot golf hotel |
178//! ```
179//!
180//! Instead, use single spaces (and single filler dashes):
181//!
182//! ```markdown
183//! | Alpha bravo charlie | delta |
184//! | - | -: |
185//! | Echo | Foxtrot golf hotel |
186//! ```
187//!
188//! ## Bugs
189//!
190//! GitHub’s own algorithm to parse tables contains a bug.
191//! This bug is not present in this project.
192//! The issue relating to tables is:
193//!
194//! * [GFM tables: escaped escapes are incorrectly treated as escapes](https://github.com/github/cmark-gfm/issues/277)
195//!
196//! ## Tokens
197//!
198//! * [`GfmTable`][Name::GfmTable]
199//! * [`GfmTableBody`][Name::GfmTableBody]
200//! * [`GfmTableCell`][Name::GfmTableCell]
201//! * [`GfmTableCellDivider`][Name::GfmTableCellDivider]
202//! * [`GfmTableCellText`][Name::GfmTableCellText]
203//! * [`GfmTableDelimiterCell`][Name::GfmTableDelimiterCell]
204//! * [`GfmTableDelimiterCellValue`][Name::GfmTableDelimiterCellValue]
205//! * [`GfmTableDelimiterFiller`][Name::GfmTableDelimiterFiller]
206//! * [`GfmTableDelimiterMarker`][Name::GfmTableDelimiterMarker]
207//! * [`GfmTableDelimiterRow`][Name::GfmTableDelimiterRow]
208//! * [`GfmTableHead`][Name::GfmTableHead]
209//! * [`GfmTableRow`][Name::GfmTableRow]
210//! * [`LineEnding`][Name::LineEnding]
211//!
212//! ## References
213//!
214//! * [`micromark-extension-gfm-table`](https://github.com/micromark/micromark-extension-gfm-table)
215//! * [*§ 4.10 Tables (extension)* in `GFM`](https://github.github.com/gfm/#tables-extension-)
216//!
217//! [flow]: crate::construct::flow
218//! [text]: crate::construct::text
219//! [attention]: crate::construct::attention
220//! [raw_text]: crate::construct::raw_text
221//! [html_table]: https://html.spec.whatwg.org/multipage/tables.html#the-table-element
222//! [html_tbody]: https://html.spec.whatwg.org/multipage/tables.html#the-tbody-element
223//! [html_td]: https://html.spec.whatwg.org/multipage/tables.html#the-td-element
224//! [html_th]: https://html.spec.whatwg.org/multipage/tables.html#the-th-element
225//! [html_thead]: https://html.spec.whatwg.org/multipage/tables.html#the-thead-element
226//! [html_tr]: https://html.spec.whatwg.org/multipage/tables.html#the-tr-element
227
228use crate::construct::partial_space_or_tab::{space_or_tab, space_or_tab_min_max};
229use crate::event::{Content, Event, Kind, Link, Name};
230use crate::resolve::Name as ResolveName;
231use crate::state::{Name as StateName, State};
232use crate::subtokenize::Subresult;
233use crate::tokenizer::Tokenizer;
234use crate::util::{constant::TAB_SIZE, skip::opt_back as skip_opt_back};
235use alloc::vec;
236
237/// Start of a GFM table.
238///
239/// If there is a valid table row or table head before, then we try to parse
240/// another row.
241/// Otherwise, we try to parse a head.
242///
243/// ```markdown
244/// > | | a |
245/// ^
246/// | | - |
247/// > | | b |
248/// ^
249/// ```
250pub fn start(tokenizer: &mut Tokenizer) -> State {
251 if tokenizer.parse_state.options.constructs.gfm_table {
252 if !tokenizer.pierce
253 && !tokenizer.events.is_empty()
254 && matches!(
255 tokenizer.events[skip_opt_back(
256 &tokenizer.events,
257 tokenizer.events.len() - 1,
258 &[Name::LineEnding, Name::SpaceOrTab],
259 )]
260 .name,
261 Name::GfmTableHead | Name::GfmTableRow
262 )
263 {
264 State::Retry(StateName::GfmTableBodyRowStart)
265 } else {
266 State::Retry(StateName::GfmTableHeadRowBefore)
267 }
268 } else {
269 State::Nok
270 }
271}
272
273/// Before table head row.
274///
275/// ```markdown
276/// > | | a |
277/// ^
278/// | | - |
279/// | | b |
280/// ```
281pub fn head_row_before(tokenizer: &mut Tokenizer) -> State {
282 tokenizer.enter(Name::GfmTableHead);
283 tokenizer.enter(Name::GfmTableRow);
284 if matches!(tokenizer.current, Some(b'\t' | b' ')) {
285 tokenizer.attempt(State::Next(StateName::GfmTableHeadRowStart), State::Nok);
286 State::Retry(space_or_tab_min_max(
287 tokenizer,
288 0,
289 if tokenizer.parse_state.options.constructs.code_indented {
290 TAB_SIZE - 1
291 } else {
292 usize::MAX
293 },
294 ))
295 } else {
296 State::Retry(StateName::GfmTableHeadRowStart)
297 }
298}
299
300/// Before table head row, after whitespace.
301///
302/// ```markdown
303/// > | | a |
304/// ^
305/// | | - |
306/// | | b |
307/// ```
308pub fn head_row_start(tokenizer: &mut Tokenizer) -> State {
309 match tokenizer.current {
310 // 4+ spaces.
311 Some(b'\t' | b' ') => State::Nok,
312 Some(b'|') => State::Retry(StateName::GfmTableHeadRowBreak),
313 _ => {
314 tokenizer.tokenize_state.seen = true;
315 // Count the first character, that isn’t a pipe, double.
316 tokenizer.tokenize_state.size_b += 1;
317 State::Retry(StateName::GfmTableHeadRowBreak)
318 }
319 }
320}
321
322/// At break in table head row.
323///
324/// ```markdown
325/// > | | a |
326/// ^
327/// ^
328/// ^
329/// | | - |
330/// | | b |
331/// ```
332pub fn head_row_break(tokenizer: &mut Tokenizer) -> State {
333 match tokenizer.current {
334 None => {
335 tokenizer.tokenize_state.seen = false;
336 tokenizer.tokenize_state.size = 0;
337 tokenizer.tokenize_state.size_b = 0;
338 State::Nok
339 }
340 Some(b'\n') => {
341 // If anything other than one pipe (ignoring whitespace) was used, it’s fine.
342 if tokenizer.tokenize_state.size_b > 1 {
343 tokenizer.tokenize_state.size_b = 0;
344 // Feel free to interrupt:
345 tokenizer.interrupt = true;
346 tokenizer.exit(Name::GfmTableRow);
347 tokenizer.enter(Name::LineEnding);
348 tokenizer.consume();
349 tokenizer.exit(Name::LineEnding);
350 State::Next(StateName::GfmTableHeadDelimiterStart)
351 } else {
352 tokenizer.tokenize_state.seen = false;
353 tokenizer.tokenize_state.size = 0;
354 tokenizer.tokenize_state.size_b = 0;
355 State::Nok
356 }
357 }
358 Some(b'\t' | b' ') => {
359 tokenizer.attempt(State::Next(StateName::GfmTableHeadRowBreak), State::Nok);
360 State::Retry(space_or_tab(tokenizer))
361 }
362 _ => {
363 tokenizer.tokenize_state.size_b += 1;
364
365 // Whether a delimiter was seen.
366 if tokenizer.tokenize_state.seen {
367 tokenizer.tokenize_state.seen = false;
368 // Header cell count.
369 tokenizer.tokenize_state.size += 1;
370 }
371
372 if tokenizer.current == Some(b'|') {
373 tokenizer.enter(Name::GfmTableCellDivider);
374 tokenizer.consume();
375 tokenizer.exit(Name::GfmTableCellDivider);
376 // Whether a delimiter was seen.
377 tokenizer.tokenize_state.seen = true;
378 State::Next(StateName::GfmTableHeadRowBreak)
379 } else {
380 // Anything else is cell data.
381 tokenizer.enter(Name::Data);
382 State::Retry(StateName::GfmTableHeadRowData)
383 }
384 }
385 }
386}
387
388/// In table head row data.
389///
390/// ```markdown
391/// > | | a |
392/// ^
393/// | | - |
394/// | | b |
395/// ```
396pub fn head_row_data(tokenizer: &mut Tokenizer) -> State {
397 match tokenizer.current {
398 None | Some(b'\t' | b'\n' | b' ' | b'|') => {
399 tokenizer.exit(Name::Data);
400 State::Retry(StateName::GfmTableHeadRowBreak)
401 }
402 _ => {
403 let name = if tokenizer.current == Some(b'\\') {
404 StateName::GfmTableHeadRowEscape
405 } else {
406 StateName::GfmTableHeadRowData
407 };
408 tokenizer.consume();
409 State::Next(name)
410 }
411 }
412}
413
414/// In table head row escape.
415///
416/// ```markdown
417/// > | | a\-b |
418/// ^
419/// | | ---- |
420/// | | c |
421/// ```
422pub fn head_row_escape(tokenizer: &mut Tokenizer) -> State {
423 match tokenizer.current {
424 Some(b'\\' | b'|') => {
425 tokenizer.consume();
426 State::Next(StateName::GfmTableHeadRowData)
427 }
428 _ => State::Retry(StateName::GfmTableHeadRowData),
429 }
430}
431
432/// Before delimiter row.
433///
434/// ```markdown
435/// | | a |
436/// > | | - |
437/// ^
438/// | | b |
439/// ```
440pub fn head_delimiter_start(tokenizer: &mut Tokenizer) -> State {
441 // Reset `interrupt`.
442 tokenizer.interrupt = false;
443
444 if tokenizer.lazy || tokenizer.pierce {
445 tokenizer.tokenize_state.size = 0;
446 State::Nok
447 } else {
448 tokenizer.enter(Name::GfmTableDelimiterRow);
449 // Track if we’ve seen a `:` or `|`.
450 tokenizer.tokenize_state.seen = false;
451
452 match tokenizer.current {
453 Some(b'\t' | b' ') => {
454 tokenizer.attempt(
455 State::Next(StateName::GfmTableHeadDelimiterBefore),
456 State::Next(StateName::GfmTableHeadDelimiterNok),
457 );
458
459 State::Retry(space_or_tab_min_max(
460 tokenizer,
461 0,
462 if tokenizer.parse_state.options.constructs.code_indented {
463 TAB_SIZE - 1
464 } else {
465 usize::MAX
466 },
467 ))
468 }
469 _ => State::Retry(StateName::GfmTableHeadDelimiterBefore),
470 }
471 }
472}
473
474/// Before delimiter row, after optional whitespace.
475///
476/// Reused when a `|` is found later, to parse another cell.
477///
478/// ```markdown
479/// | | a |
480/// > | | - |
481/// ^
482/// | | b |
483/// ```
484pub fn head_delimiter_before(tokenizer: &mut Tokenizer) -> State {
485 match tokenizer.current {
486 Some(b'-' | b':') => State::Retry(StateName::GfmTableHeadDelimiterValueBefore),
487 Some(b'|') => {
488 tokenizer.tokenize_state.seen = true;
489 // If we start with a pipe, we open a cell marker.
490 tokenizer.enter(Name::GfmTableCellDivider);
491 tokenizer.consume();
492 tokenizer.exit(Name::GfmTableCellDivider);
493 State::Next(StateName::GfmTableHeadDelimiterCellBefore)
494 }
495 // More whitespace / empty row not allowed at start.
496 _ => State::Retry(StateName::GfmTableHeadDelimiterNok),
497 }
498}
499
500/// After `|`, before delimiter cell.
501///
502/// ```markdown
503/// | | a |
504/// > | | - |
505/// ^
506/// ```
507pub fn head_delimiter_cell_before(tokenizer: &mut Tokenizer) -> State {
508 match tokenizer.current {
509 Some(b'\t' | b' ') => {
510 tokenizer.attempt(
511 State::Next(StateName::GfmTableHeadDelimiterValueBefore),
512 State::Nok,
513 );
514 State::Retry(space_or_tab(tokenizer))
515 }
516 _ => State::Retry(StateName::GfmTableHeadDelimiterValueBefore),
517 }
518}
519
520/// Before delimiter cell value.
521///
522/// ```markdown
523/// | | a |
524/// > | | - |
525/// ^
526/// ```
527pub fn head_delimiter_value_before(tokenizer: &mut Tokenizer) -> State {
528 match tokenizer.current {
529 None | Some(b'\n') => State::Retry(StateName::GfmTableHeadDelimiterCellAfter),
530 Some(b':') => {
531 // Align: left.
532 tokenizer.tokenize_state.size_b += 1;
533 tokenizer.tokenize_state.seen = true;
534 tokenizer.enter(Name::GfmTableDelimiterMarker);
535 tokenizer.consume();
536 tokenizer.exit(Name::GfmTableDelimiterMarker);
537 State::Next(StateName::GfmTableHeadDelimiterLeftAlignmentAfter)
538 }
539 Some(b'-') => {
540 // Align: none.
541 tokenizer.tokenize_state.size_b += 1;
542 State::Retry(StateName::GfmTableHeadDelimiterLeftAlignmentAfter)
543 }
544 _ => State::Retry(StateName::GfmTableHeadDelimiterNok),
545 }
546}
547
548/// After delimiter cell left alignment marker.
549///
550/// ```markdown
551/// | | a |
552/// > | | :- |
553/// ^
554/// ```
555pub fn head_delimiter_left_alignment_after(tokenizer: &mut Tokenizer) -> State {
556 match tokenizer.current {
557 Some(b'-') => {
558 tokenizer.enter(Name::GfmTableDelimiterFiller);
559 State::Retry(StateName::GfmTableHeadDelimiterFiller)
560 }
561 // Anything else is not ok after the left-align colon.
562 _ => State::Retry(StateName::GfmTableHeadDelimiterNok),
563 }
564}
565
566/// In delimiter cell filler.
567///
568/// ```markdown
569/// | | a |
570/// > | | - |
571/// ^
572/// ```
573pub fn head_delimiter_filler(tokenizer: &mut Tokenizer) -> State {
574 match tokenizer.current {
575 Some(b'-') => {
576 tokenizer.consume();
577 State::Next(StateName::GfmTableHeadDelimiterFiller)
578 }
579 Some(b':') => {
580 // Align is `center` if it was `left`, `right` otherwise.
581 tokenizer.tokenize_state.seen = true;
582 tokenizer.exit(Name::GfmTableDelimiterFiller);
583 tokenizer.enter(Name::GfmTableDelimiterMarker);
584 tokenizer.consume();
585 tokenizer.exit(Name::GfmTableDelimiterMarker);
586 State::Next(StateName::GfmTableHeadDelimiterRightAlignmentAfter)
587 }
588 _ => {
589 tokenizer.exit(Name::GfmTableDelimiterFiller);
590 State::Retry(StateName::GfmTableHeadDelimiterRightAlignmentAfter)
591 }
592 }
593}
594
595/// After delimiter cell right alignment marker.
596///
597/// ```markdown
598/// | | a |
599/// > | | -: |
600/// ^
601/// ```
602pub fn head_delimiter_right_alignment_after(tokenizer: &mut Tokenizer) -> State {
603 match tokenizer.current {
604 Some(b'\t' | b' ') => {
605 tokenizer.attempt(
606 State::Next(StateName::GfmTableHeadDelimiterCellAfter),
607 State::Nok,
608 );
609 State::Retry(space_or_tab(tokenizer))
610 }
611 _ => State::Retry(StateName::GfmTableHeadDelimiterCellAfter),
612 }
613}
614
615/// After delimiter cell.
616///
617/// ```markdown
618/// | | a |
619/// > | | -: |
620/// ^
621/// ```
622pub fn head_delimiter_cell_after(tokenizer: &mut Tokenizer) -> State {
623 match tokenizer.current {
624 None | Some(b'\n') => {
625 // Exit when:
626 // * there was no `:` or `|` at all (it’s a thematic break or setext
627 // underline instead)
628 // * the header cell count is not the delimiter cell count
629 if !tokenizer.tokenize_state.seen
630 || tokenizer.tokenize_state.size != tokenizer.tokenize_state.size_b
631 {
632 State::Retry(StateName::GfmTableHeadDelimiterNok)
633 } else {
634 // Reset.
635 tokenizer.tokenize_state.seen = false;
636 tokenizer.tokenize_state.size = 0;
637 tokenizer.tokenize_state.size_b = 0;
638 tokenizer.exit(Name::GfmTableDelimiterRow);
639 tokenizer.exit(Name::GfmTableHead);
640 tokenizer.register_resolver(ResolveName::GfmTable);
641 State::Ok
642 }
643 }
644 Some(b'|') => State::Retry(StateName::GfmTableHeadDelimiterBefore),
645 _ => State::Retry(StateName::GfmTableHeadDelimiterNok),
646 }
647}
648
649/// In delimiter row, at a disallowed byte.
650///
651/// ```markdown
652/// | | a |
653/// > | | x |
654/// ^
655/// ```
656pub fn head_delimiter_nok(tokenizer: &mut Tokenizer) -> State {
657 // Reset.
658 tokenizer.tokenize_state.seen = false;
659 tokenizer.tokenize_state.size = 0;
660 tokenizer.tokenize_state.size_b = 0;
661 State::Nok
662}
663
664/// Before table body row.
665///
666/// ```markdown
667/// | | a |
668/// | | - |
669/// > | | b |
670/// ^
671/// ```
672pub fn body_row_start(tokenizer: &mut Tokenizer) -> State {
673 if tokenizer.lazy {
674 State::Nok
675 } else {
676 tokenizer.enter(Name::GfmTableRow);
677
678 match tokenizer.current {
679 Some(b'\t' | b' ') => {
680 tokenizer.attempt(State::Next(StateName::GfmTableBodyRowBreak), State::Nok);
681 // We’re parsing a body row.
682 // If we’re here, we already attempted blank lines and indented
683 // code.
684 // So parse as much whitespace as needed:
685 State::Retry(space_or_tab_min_max(tokenizer, 0, usize::MAX))
686 }
687 _ => State::Retry(StateName::GfmTableBodyRowBreak),
688 }
689 }
690}
691
692/// At break in table body row.
693///
694/// ```markdown
695/// | | a |
696/// | | - |
697/// > | | b |
698/// ^
699/// ^
700/// ^
701/// ```
702pub fn body_row_break(tokenizer: &mut Tokenizer) -> State {
703 match tokenizer.current {
704 None | Some(b'\n') => {
705 tokenizer.exit(Name::GfmTableRow);
706 State::Ok
707 }
708 Some(b'\t' | b' ') => {
709 tokenizer.attempt(State::Next(StateName::GfmTableBodyRowBreak), State::Nok);
710 State::Retry(space_or_tab(tokenizer))
711 }
712 Some(b'|') => {
713 tokenizer.enter(Name::GfmTableCellDivider);
714 tokenizer.consume();
715 tokenizer.exit(Name::GfmTableCellDivider);
716 State::Next(StateName::GfmTableBodyRowBreak)
717 }
718 // Anything else is cell content.
719 _ => {
720 tokenizer.enter(Name::Data);
721 State::Retry(StateName::GfmTableBodyRowData)
722 }
723 }
724}
725
726/// In table body row data.
727///
728/// ```markdown
729/// | | a |
730/// | | - |
731/// > | | b |
732/// ^
733/// ```
734pub fn body_row_data(tokenizer: &mut Tokenizer) -> State {
735 match tokenizer.current {
736 None | Some(b'\t' | b'\n' | b' ' | b'|') => {
737 tokenizer.exit(Name::Data);
738 State::Retry(StateName::GfmTableBodyRowBreak)
739 }
740 _ => {
741 let name = if tokenizer.current == Some(b'\\') {
742 StateName::GfmTableBodyRowEscape
743 } else {
744 StateName::GfmTableBodyRowData
745 };
746 tokenizer.consume();
747 State::Next(name)
748 }
749 }
750}
751
752/// In table body row escape.
753///
754/// ```markdown
755/// | | a |
756/// | | ---- |
757/// > | | b\-c |
758/// ^
759/// ```
760pub fn body_row_escape(tokenizer: &mut Tokenizer) -> State {
761 match tokenizer.current {
762 Some(b'\\' | b'|') => {
763 tokenizer.consume();
764 State::Next(StateName::GfmTableBodyRowData)
765 }
766 _ => State::Retry(StateName::GfmTableBodyRowData),
767 }
768}
769
770/// Resolve GFM table.
771pub fn resolve(tokenizer: &mut Tokenizer) -> Option<Subresult> {
772 let mut index = 0;
773 let mut in_first_cell_awaiting_pipe = true;
774 let mut in_row = false;
775 let mut in_delimiter_row = false;
776 let mut last_cell = (0, 0, 0, 0);
777 let mut cell = (0, 0, 0, 0);
778 let mut after_head_awaiting_first_body_row = false;
779 let mut last_table_end = 0;
780 let mut last_table_has_body = false;
781
782 while index < tokenizer.events.len() {
783 let event = &tokenizer.events[index];
784
785 if event.kind == Kind::Enter {
786 // Start of head.
787 if event.name == Name::GfmTableHead {
788 after_head_awaiting_first_body_row = false;
789
790 // Inject previous (body end and) table end.
791 if last_table_end != 0 {
792 flush_table_end(tokenizer, last_table_end, last_table_has_body);
793 last_table_has_body = false;
794 last_table_end = 0;
795 }
796
797 // Inject table start.
798 let enter = Event {
799 kind: Kind::Enter,
800 name: Name::GfmTable,
801 point: tokenizer.events[index].point.clone(),
802 link: None,
803 };
804 tokenizer.map.add(index, 0, vec![enter]);
805 } else if matches!(event.name, Name::GfmTableRow | Name::GfmTableDelimiterRow) {
806 in_delimiter_row = event.name == Name::GfmTableDelimiterRow;
807 in_row = true;
808 in_first_cell_awaiting_pipe = true;
809 last_cell = (0, 0, 0, 0);
810 cell = (0, index + 1, 0, 0);
811
812 // Inject table body start.
813 if after_head_awaiting_first_body_row {
814 after_head_awaiting_first_body_row = false;
815 last_table_has_body = true;
816 let enter = Event {
817 kind: Kind::Enter,
818 name: Name::GfmTableBody,
819 point: tokenizer.events[index].point.clone(),
820 link: None,
821 };
822 tokenizer.map.add(index, 0, vec![enter]);
823 }
824 }
825 // Cell data.
826 else if in_row
827 && matches!(
828 event.name,
829 Name::Data | Name::GfmTableDelimiterMarker | Name::GfmTableDelimiterFiller
830 )
831 {
832 in_first_cell_awaiting_pipe = false;
833
834 // First value in cell.
835 if cell.2 == 0 {
836 if last_cell.1 != 0 {
837 cell.0 = cell.1;
838 flush_cell(tokenizer, last_cell, in_delimiter_row, None);
839 last_cell = (0, 0, 0, 0);
840 }
841
842 cell.2 = index;
843 }
844 } else if event.name == Name::GfmTableCellDivider {
845 if in_first_cell_awaiting_pipe {
846 in_first_cell_awaiting_pipe = false;
847 } else {
848 if last_cell.1 != 0 {
849 cell.0 = cell.1;
850 flush_cell(tokenizer, last_cell, in_delimiter_row, None);
851 }
852
853 last_cell = cell;
854 cell = (last_cell.1, index, 0, 0);
855 }
856 }
857 // Exit events.
858 } else if event.name == Name::GfmTableHead {
859 after_head_awaiting_first_body_row = true;
860 last_table_end = index;
861 } else if matches!(event.name, Name::GfmTableRow | Name::GfmTableDelimiterRow) {
862 in_row = false;
863 last_table_end = index;
864 if last_cell.1 != 0 {
865 cell.0 = cell.1;
866 flush_cell(tokenizer, last_cell, in_delimiter_row, Some(index));
867 } else if cell.1 != 0 {
868 flush_cell(tokenizer, cell, in_delimiter_row, Some(index));
869 }
870 } else if in_row
871 && (matches!(
872 event.name,
873 Name::Data | Name::GfmTableDelimiterMarker | Name::GfmTableDelimiterFiller
874 ))
875 {
876 cell.3 = index;
877 }
878
879 index += 1;
880 }
881
882 if last_table_end != 0 {
883 flush_table_end(tokenizer, last_table_end, last_table_has_body);
884 }
885
886 tokenizer.map.consume(&mut tokenizer.events);
887 None
888}
889
890/// Generate a cell.
891fn flush_cell(
892 tokenizer: &mut Tokenizer,
893 range: (usize, usize, usize, usize),
894 in_delimiter_row: bool,
895 row_end: Option<usize>,
896) {
897 let group_name = if in_delimiter_row {
898 Name::GfmTableDelimiterCell
899 } else {
900 Name::GfmTableCell
901 };
902 let value_name = if in_delimiter_row {
903 Name::GfmTableDelimiterCellValue
904 } else {
905 Name::GfmTableCellText
906 };
907
908 // Insert an exit for the previous cell, if there is one.
909 //
910 // ```markdown
911 // > | | aa | bb | cc |
912 // ^-- exit
913 // ^^^^-- this cell
914 // ```
915 if range.0 != 0 {
916 tokenizer.map.add(
917 range.0,
918 0,
919 vec![Event {
920 kind: Kind::Exit,
921 name: group_name.clone(),
922 point: tokenizer.events[range.0].point.clone(),
923 link: None,
924 }],
925 );
926 }
927
928 // Insert enter of this cell.
929 //
930 // ```markdown
931 // > | | aa | bb | cc |
932 // ^-- enter
933 // ^^^^-- this cell
934 // ```
935 tokenizer.map.add(
936 range.1,
937 0,
938 vec![Event {
939 kind: Kind::Enter,
940 name: group_name.clone(),
941 point: tokenizer.events[range.1].point.clone(),
942 link: None,
943 }],
944 );
945
946 // Insert text start at first data start and end at last data end, and
947 // remove events between.
948 //
949 // ```markdown
950 // > | | aa | bb | cc |
951 // ^-- enter
952 // ^-- exit
953 // ^^^^-- this cell
954 // ```
955 if range.2 != 0 {
956 tokenizer.map.add(
957 range.2,
958 0,
959 vec![Event {
960 kind: Kind::Enter,
961 name: value_name.clone(),
962 point: tokenizer.events[range.2].point.clone(),
963 link: None,
964 }],
965 );
966 debug_assert_ne!(range.3, 0);
967
968 if !in_delimiter_row {
969 tokenizer.events[range.2].link = Some(Link {
970 previous: None,
971 next: None,
972 content: Content::Text,
973 });
974
975 // To do: positional info of the remaining `data` nodes likely have
976 // to be fixed.
977 if range.3 > range.2 + 1 {
978 let a = range.2 + 1;
979 let b = range.3 - range.2 - 1;
980 tokenizer.map.add(a, b, vec![]);
981 }
982 }
983
984 tokenizer.map.add(
985 range.3 + 1,
986 0,
987 vec![Event {
988 kind: Kind::Exit,
989 name: value_name,
990 point: tokenizer.events[range.3].point.clone(),
991 link: None,
992 }],
993 );
994 }
995
996 // Insert an exit for the last cell, if at the row end.
997 //
998 // ```markdown
999 // > | | aa | bb | cc |
1000 // ^-- exit
1001 // ^^^^^^-- this cell (the last one contains two “between” parts)
1002 // ```
1003 if let Some(row_end) = row_end {
1004 tokenizer.map.add(
1005 row_end,
1006 0,
1007 vec![Event {
1008 kind: Kind::Exit,
1009 name: group_name,
1010 point: tokenizer.events[row_end].point.clone(),
1011 link: None,
1012 }],
1013 );
1014 }
1015}
1016
1017/// Generate table end (and table body end).
1018fn flush_table_end(tokenizer: &mut Tokenizer, index: usize, body: bool) {
1019 let mut exits = vec![];
1020
1021 if body {
1022 exits.push(Event {
1023 kind: Kind::Exit,
1024 name: Name::GfmTableBody,
1025 point: tokenizer.events[index].point.clone(),
1026 link: None,
1027 });
1028 }
1029
1030 exits.push(Event {
1031 kind: Kind::Exit,
1032 name: Name::GfmTable,
1033 point: tokenizer.events[index].point.clone(),
1034 link: None,
1035 });
1036
1037 tokenizer.map.add(index + 1, 0, exits);
1038}