Markdown parser fork with extended syntax for personal use.
at hack 155 lines 5.4 kB view raw
1//! Thematic break occurs in the [flow][] content type. 2//! 3//! ## Grammar 4//! 5//! Thematic break forms with the following BNF 6//! (<small>see [construct][crate::construct] for character groups</small>): 7//! 8//! ```bnf 9//! ; Restriction: all markers must be identical. 10//! ; Restriction: at least 3 markers must be used. 11//! thematic_break ::= *space_or_tab 1*(1*marker *space_or_tab) 12//! 13//! marker ::= '*' | '-' | '_' 14//! ``` 15//! 16//! As this construct occurs in flow, like all flow constructs, it must be 17//! followed by an eol (line ending) or eof (end of file). 18//! 19//! ## HTML 20//! 21//! Thematic breaks in markdown typically relate to the HTML element `<hr>`. 22//! See [*§ 4.4.2 The `hr` element* in the HTML spec][html] for more info. 23//! 24//! ## Recommendation 25//! 26//! It is recommended to use exactly three asterisks without whitespace when 27//! writing markdown. 28//! As using more than three markers has no effect other than wasting space, 29//! it is recommended to use exactly three markers. 30//! Thematic breaks formed with asterisks or dashes can interfere with 31//! [list][list-item]s if there is whitespace between them: `* * *` and `- - -`. 32//! For these reasons, it is recommend to not use spaces or tabs between the 33//! markers. 34//! Thematic breaks formed with dashes (without whitespace) can also form 35//! [heading (setext)][heading_setext]. 36//! As dashes and underscores frequently occur in natural language and URLs, it 37//! is recommended to use asterisks for thematic breaks to distinguish from 38//! such use. 39//! Because asterisks can be used to form the most markdown constructs, using 40//! them has the added benefit of making it easier to gloss over markdown: you 41//! can look for asterisks to find syntax while not worrying about other 42//! characters. 43//! 44//! ## Tokens 45//! 46//! * [`ThematicBreak`][Name::ThematicBreak] 47//! * [`ThematicBreakSequence`][Name::ThematicBreakSequence] 48//! 49//! ## References 50//! 51//! * [`thematic-break.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-core-commonmark/dev/lib/thematic-break.js) 52//! * [*§ 4.1 Thematic breaks* in `CommonMark`](https://spec.commonmark.org/0.31/#thematic-breaks) 53//! 54//! [flow]: crate::construct::flow 55//! [heading_setext]: crate::construct::heading_setext 56//! [list-item]: crate::construct::list_item 57//! [html]: https://html.spec.whatwg.org/multipage/grouping-content.html#the-hr-element 58 59use crate::construct::partial_space_or_tab::{space_or_tab, space_or_tab_min_max}; 60use crate::event::Name; 61use crate::state::{Name as StateName, State}; 62use crate::tokenizer::Tokenizer; 63use crate::util::constant::{TAB_SIZE, THEMATIC_BREAK_MARKER_COUNT_MIN}; 64 65/// Start of thematic break. 66/// 67/// ```markdown 68/// > | *** 69/// ^ 70/// ``` 71pub fn start(tokenizer: &mut Tokenizer) -> State { 72 if tokenizer.parse_state.options.constructs.thematic_break { 73 tokenizer.enter(Name::ThematicBreak); 74 75 if matches!(tokenizer.current, Some(b'\t' | b' ')) { 76 tokenizer.attempt(State::Next(StateName::ThematicBreakBefore), State::Nok); 77 State::Retry(space_or_tab_min_max( 78 tokenizer, 79 0, 80 if tokenizer.parse_state.options.constructs.code_indented { 81 TAB_SIZE - 1 82 } else { 83 usize::MAX 84 }, 85 )) 86 } else { 87 State::Retry(StateName::ThematicBreakBefore) 88 } 89 } else { 90 State::Nok 91 } 92} 93 94/// After optional whitespace, at marker. 95/// 96/// ```markdown 97/// > | *** 98/// ^ 99/// ``` 100pub fn before(tokenizer: &mut Tokenizer) -> State { 101 match tokenizer.current { 102 Some(b'*' | b'-' | b'_') => { 103 tokenizer.tokenize_state.marker = tokenizer.current.unwrap(); 104 State::Retry(StateName::ThematicBreakAtBreak) 105 } 106 _ => State::Nok, 107 } 108} 109 110/// After something, before something else. 111/// 112/// ```markdown 113/// > | *** 114/// ^ 115/// ``` 116pub fn at_break(tokenizer: &mut Tokenizer) -> State { 117 if tokenizer.current == Some(tokenizer.tokenize_state.marker) { 118 tokenizer.enter(Name::ThematicBreakSequence); 119 State::Retry(StateName::ThematicBreakSequence) 120 } else if tokenizer.tokenize_state.size >= THEMATIC_BREAK_MARKER_COUNT_MIN 121 && matches!(tokenizer.current, None | Some(b'\n')) 122 { 123 tokenizer.tokenize_state.marker = 0; 124 tokenizer.tokenize_state.size = 0; 125 tokenizer.exit(Name::ThematicBreak); 126 // Feel free to interrupt. 127 tokenizer.interrupt = false; 128 State::Ok 129 } else { 130 tokenizer.tokenize_state.marker = 0; 131 tokenizer.tokenize_state.size = 0; 132 State::Nok 133 } 134} 135 136/// In sequence. 137/// 138/// ```markdown 139/// > | *** 140/// ^ 141/// ``` 142pub fn sequence(tokenizer: &mut Tokenizer) -> State { 143 if tokenizer.current == Some(tokenizer.tokenize_state.marker) { 144 tokenizer.consume(); 145 tokenizer.tokenize_state.size += 1; 146 State::Next(StateName::ThematicBreakSequence) 147 } else if matches!(tokenizer.current, Some(b'\t' | b' ')) { 148 tokenizer.exit(Name::ThematicBreakSequence); 149 tokenizer.attempt(State::Next(StateName::ThematicBreakAtBreak), State::Nok); 150 State::Retry(space_or_tab(tokenizer)) 151 } else { 152 tokenizer.exit(Name::ThematicBreakSequence); 153 State::Retry(StateName::ThematicBreakAtBreak) 154 } 155}