Markdown parser fork with extended syntax for personal use.
at hack 85 lines 2.8 kB view raw
1//! Character escapes occur in the [string][] and [text][] content types. 2//! 3//! ## Grammar 4//! 5//! Character escapes form with the following BNF 6//! (<small>see [construct][crate::construct] for character groups</small>): 7//! 8//! ```bnf 9//! character_escape ::= '\\' ascii_punctuation 10//! ``` 11//! 12//! Like much of markdown, there are no “invalid” character escapes: just a 13//! slash, or a slash followed by anything other than an ASCII punctuation 14//! character, is just a slash. 15//! 16//! To escape other characters, use a [character reference][character_reference] 17//! instead (as in, `&amp;`, `&#123;`, or say `&#x9;`). 18//! 19//! It is also possible to escape a line ending in text with a similar 20//! construct: a [hard break (escape)][hard_break_escape] is a backslash followed 21//! by a line ending (that is part of the construct instead of ending it). 22//! 23//! ## Recommendation 24//! 25//! If possible, use a character escape. 26//! Otherwise, use a character reference. 27//! 28//! ## Tokens 29//! 30//! * [`CharacterEscape`][Name::CharacterEscape] 31//! * [`CharacterEscapeMarker`][Name::CharacterEscapeMarker] 32//! * [`CharacterEscapeValue`][Name::CharacterEscapeValue] 33//! 34//! ## References 35//! 36//! * [`character-escape.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-core-commonmark/dev/lib/character-escape.js) 37//! * [*§ 2.4 Backslash escapes* in `CommonMark`](https://spec.commonmark.org/0.31/#backslash-escapes) 38//! 39//! [string]: crate::construct::string 40//! [text]: crate::construct::text 41//! [character_reference]: crate::construct::character_reference 42//! [hard_break_escape]: crate::construct::hard_break_escape 43 44use crate::event::Name; 45use crate::state::{Name as StateName, State}; 46use crate::tokenizer::Tokenizer; 47 48/// Start of character escape. 49/// 50/// ```markdown 51/// > | a\*b 52/// ^ 53/// ``` 54pub fn start(tokenizer: &mut Tokenizer) -> State { 55 if tokenizer.parse_state.options.constructs.character_escape && tokenizer.current == Some(b'\\') 56 { 57 tokenizer.enter(Name::CharacterEscape); 58 tokenizer.enter(Name::CharacterEscapeMarker); 59 tokenizer.consume(); 60 tokenizer.exit(Name::CharacterEscapeMarker); 61 State::Next(StateName::CharacterEscapeInside) 62 } else { 63 State::Nok 64 } 65} 66 67/// After `\`, at punctuation. 68/// 69/// ```markdown 70/// > | a\*b 71/// ^ 72/// ``` 73pub fn inside(tokenizer: &mut Tokenizer) -> State { 74 match tokenizer.current { 75 // ASCII punctuation. 76 Some(b'!'..=b'/' | b':'..=b'@' | b'['..=b'`' | b'{'..=b'~') => { 77 tokenizer.enter(Name::CharacterEscapeValue); 78 tokenizer.consume(); 79 tokenizer.exit(Name::CharacterEscapeValue); 80 tokenizer.exit(Name::CharacterEscape); 81 State::Ok 82 } 83 _ => State::Nok, 84 } 85}