Markdown parser fork with extended syntax for personal use.
at hack 271 lines 9.5 kB view raw
1//! Raw (text) occurs in the [text][] content type. 2//! It forms code (text) and math (text). 3//! 4//! ## Grammar 5//! 6//! Raw (text) forms with the following BNF 7//! (<small>see [construct][crate::construct] for character groups</small>): 8//! 9//! ```bnf 10//! ; Restriction: the number of markers in the closing sequence must be equal 11//! ; to the number of markers in the opening sequence. 12//! raw_text ::= sequence 1*byte sequence 13//! 14//! ; Restriction: not preceded or followed by the same marker. 15//! sequence ::= 1*'`' | 1*'$' 16//! ``` 17//! 18//! The above grammar shows that it is not possible to create empty raw (text). 19//! It is possible to include the sequence marker (grave accent for code, 20//! dollar for math) in raw (text), by wrapping it in bigger or smaller 21//! sequences: 22//! 23//! ```markdown 24//! Include more: `a``b` or include less: ``a`b``. 25//! ``` 26//! 27//! It is also possible to include just one marker: 28//! 29//! ```markdown 30//! Include just one: `` ` ``. 31//! ``` 32//! 33//! Sequences are “gready”, in that they cannot be preceded or followed by 34//! more markers. 35//! To illustrate: 36//! 37//! ```markdown 38//! Not code: ``x`. 39//! 40//! Not code: `x``. 41//! 42//! Escapes work, this is code: \``x`. 43//! 44//! Escapes work, this is code: `x`\`. 45//! ``` 46//! 47//! Yields: 48//! 49//! ```html 50//! <p>Not code: ``x`.</p> 51//! <p>Not code: `x``.</p> 52//! <p>Escapes work, this is code: `<code>x</code>.</p> 53//! <p>Escapes work, this is code: <code>x</code>`.</p> 54//! ``` 55//! 56//! That is because, when turning markdown into HTML, the first and last space, 57//! if both exist and there is also a non-space in the code, are removed. 58//! Line endings, at that stage, are considered as spaces. 59//! 60//! In markdown, it is possible to create code or math with the 61//! [raw (flow)][raw_flow] (or [code (indented)][code_indented]) constructs 62//! in the [flow][] content type. 63//! 64//! ## HTML 65//! 66//! Code (text) relates to the `<code>` element in HTML. 67//! See [*§ 4.5.15 The `code` element*][html_code] in the HTML spec for more 68//! info. 69//! 70//! Math (text) does not relate to HTML elements. 71//! `MathML`, which is sort of like SVG but for math, exists but it doesn’t work 72//! well and isn’t widely supported. 73//! Instead, it is recommended to use client side JavaScript with something like 74//! `KaTeX` or `MathJax` to process the math 75//! For that, the math is compiled as a `<code>` element with two classes: 76//! `language-math` and `math-inline`. 77//! Client side JavaScript can look for these classes to process them further. 78//! 79//! When turning markdown into HTML, each line ending in raw (text) is turned 80//! into a space. 81//! 82//! ## Recommendations 83//! 84//! When authoring markdown with math, keep in mind that math doesn’t work in 85//! most places. 86//! Notably, GitHub currently has a really weird crappy client-side regex-based 87//! thing. 88//! But on your own (math-heavy?) site it can be great! 89//! You can set [`parse_options.math_text_single_dollar: false`][parse_options] 90//! to improve this, as it prevents single dollars from being seen as math, and 91//! thus prevents normal dollars in text from being seen as math. 92//! 93//! ## Tokens 94//! 95//! * [`CodeText`][Name::CodeText] 96//! * [`CodeTextData`][Name::CodeTextData] 97//! * [`CodeTextSequence`][Name::CodeTextSequence] 98//! * [`MathText`][Name::MathText] 99//! * [`MathTextData`][Name::MathTextData] 100//! * [`MathTextSequence`][Name::MathTextSequence] 101//! * [`LineEnding`][Name::LineEnding] 102//! 103//! ## References 104//! 105//! * [`code-text.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-core-commonmark/dev/lib/code-text.js) 106//! * [`micromark-extension-math`](https://github.com/micromark/micromark-extension-math) 107//! * [*§ 6.1 Code spans* in `CommonMark`](https://spec.commonmark.org/0.31/#code-spans) 108//! 109//! > 👉 **Note**: math is not specified anywhere. 110//! 111//! [flow]: crate::construct::flow 112//! [text]: crate::construct::text 113//! [code_indented]: crate::construct::code_indented 114//! [raw_flow]: crate::construct::raw_flow 115//! [html_code]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-code-element 116//! [parse_options]: crate::ParseOptions 117 118use crate::event::Name; 119use crate::state::{Name as StateName, State}; 120use crate::tokenizer::Tokenizer; 121 122/// Start of raw (text). 123/// 124/// ```markdown 125/// > | `a` 126/// ^ 127/// > | \`a` 128/// ^ 129/// ``` 130pub fn start(tokenizer: &mut Tokenizer) -> State { 131 // Code (text): 132 if ((tokenizer.parse_state.options.constructs.code_text && tokenizer.current == Some(b'`')) 133 // Math (text): 134 || (tokenizer.parse_state.options.constructs.math_text && tokenizer.current == Some(b'$'))) 135 // Not the same marker (except when escaped). 136 && (tokenizer.previous != tokenizer.current 137 || (!tokenizer.events.is_empty() 138 && tokenizer.events[tokenizer.events.len() - 1].name == Name::CharacterEscape)) 139 { 140 let marker = tokenizer.current.unwrap(); 141 if marker == b'`' { 142 tokenizer.tokenize_state.token_1 = Name::CodeText; 143 tokenizer.tokenize_state.token_2 = Name::CodeTextSequence; 144 tokenizer.tokenize_state.token_3 = Name::CodeTextData; 145 } else { 146 tokenizer.tokenize_state.token_1 = Name::MathText; 147 tokenizer.tokenize_state.token_2 = Name::MathTextSequence; 148 tokenizer.tokenize_state.token_3 = Name::MathTextData; 149 } 150 tokenizer.tokenize_state.marker = marker; 151 tokenizer.enter(tokenizer.tokenize_state.token_1.clone()); 152 tokenizer.enter(tokenizer.tokenize_state.token_2.clone()); 153 State::Retry(StateName::RawTextSequenceOpen) 154 } else { 155 State::Nok 156 } 157} 158 159/// In opening sequence. 160/// 161/// ```markdown 162/// > | `a` 163/// ^ 164/// ``` 165pub fn sequence_open(tokenizer: &mut Tokenizer) -> State { 166 if tokenizer.current == Some(tokenizer.tokenize_state.marker) { 167 tokenizer.tokenize_state.size += 1; 168 tokenizer.consume(); 169 State::Next(StateName::RawTextSequenceOpen) 170 } 171 // Not enough markers in the sequence. 172 else if tokenizer.tokenize_state.marker == b'$' 173 && tokenizer.tokenize_state.size == 1 174 && !tokenizer.parse_state.options.math_text_single_dollar 175 { 176 tokenizer.tokenize_state.marker = 0; 177 tokenizer.tokenize_state.size = 0; 178 tokenizer.tokenize_state.token_1 = Name::Data; 179 tokenizer.tokenize_state.token_2 = Name::Data; 180 tokenizer.tokenize_state.token_3 = Name::Data; 181 State::Nok 182 } else { 183 tokenizer.exit(tokenizer.tokenize_state.token_2.clone()); 184 State::Retry(StateName::RawTextBetween) 185 } 186} 187 188/// Between something and something else. 189/// 190/// ```markdown 191/// > | `a` 192/// ^^ 193/// ``` 194pub fn between(tokenizer: &mut Tokenizer) -> State { 195 match tokenizer.current { 196 None => { 197 tokenizer.tokenize_state.marker = 0; 198 tokenizer.tokenize_state.size = 0; 199 tokenizer.tokenize_state.token_1 = Name::Data; 200 tokenizer.tokenize_state.token_2 = Name::Data; 201 tokenizer.tokenize_state.token_3 = Name::Data; 202 State::Nok 203 } 204 Some(b'\n') => { 205 tokenizer.enter(Name::LineEnding); 206 tokenizer.consume(); 207 tokenizer.exit(Name::LineEnding); 208 State::Next(StateName::RawTextBetween) 209 } 210 _ => { 211 if tokenizer.current == Some(tokenizer.tokenize_state.marker) { 212 tokenizer.enter(tokenizer.tokenize_state.token_2.clone()); 213 State::Retry(StateName::RawTextSequenceClose) 214 } else { 215 tokenizer.enter(tokenizer.tokenize_state.token_3.clone()); 216 State::Retry(StateName::RawTextData) 217 } 218 } 219 } 220} 221 222/// In data. 223/// 224/// ```markdown 225/// > | `a` 226/// ^ 227/// ``` 228pub fn data(tokenizer: &mut Tokenizer) -> State { 229 if matches!(tokenizer.current, None | Some(b'\n')) 230 || tokenizer.current == Some(tokenizer.tokenize_state.marker) 231 { 232 tokenizer.exit(tokenizer.tokenize_state.token_3.clone()); 233 State::Retry(StateName::RawTextBetween) 234 } else { 235 tokenizer.consume(); 236 State::Next(StateName::RawTextData) 237 } 238} 239 240/// In closing sequence. 241/// 242/// ```markdown 243/// > | `a` 244/// ^ 245/// ``` 246pub fn sequence_close(tokenizer: &mut Tokenizer) -> State { 247 if tokenizer.current == Some(tokenizer.tokenize_state.marker) { 248 tokenizer.tokenize_state.size_b += 1; 249 tokenizer.consume(); 250 State::Next(StateName::RawTextSequenceClose) 251 } else { 252 tokenizer.exit(tokenizer.tokenize_state.token_2.clone()); 253 if tokenizer.tokenize_state.size == tokenizer.tokenize_state.size_b { 254 tokenizer.exit(tokenizer.tokenize_state.token_1.clone()); 255 tokenizer.tokenize_state.marker = 0; 256 tokenizer.tokenize_state.size = 0; 257 tokenizer.tokenize_state.size_b = 0; 258 tokenizer.tokenize_state.token_1 = Name::Data; 259 tokenizer.tokenize_state.token_2 = Name::Data; 260 tokenizer.tokenize_state.token_3 = Name::Data; 261 State::Ok 262 } else { 263 // More or less accents: mark as data. 264 let len = tokenizer.events.len(); 265 tokenizer.events[len - 2].name = tokenizer.tokenize_state.token_3.clone(); 266 tokenizer.events[len - 1].name = tokenizer.tokenize_state.token_3.clone(); 267 tokenizer.tokenize_state.size_b = 0; 268 State::Retry(StateName::RawTextBetween) 269 } 270 } 271}