src/construct/character_escape.rs at hack · crashkeys.dev/markdown-rs

crashkeys.dev / markdown-rs
fork atom
Markdown parser fork with extended syntax for personal use.
fork atom
markdown-rs / src / construct / character_escape.rs
at hack 85 lines 2.8 kB view raw
wrap content
Titus Wormer Refactor docs 11mo ago
e0ca3f6c
 1//! Character escapes occur in the [string][] and [text][] content types.
 2//!
 3//! ## Grammar
 4//!
 5//! Character escapes form with the following BNF
 6//! (<small>see [construct][crate::construct] for character groups</small>):
 7//!
 8//! ```bnf
 9//! character_escape ::= '\\' ascii_punctuation
10//! ```
11//!
12//! Like much of markdown, there are no “invalid” character escapes: just a
13//! slash, or a slash followed by anything other than an ASCII punctuation
14//! character, is just a slash.
15//!
16//! To escape other characters, use a [character reference][character_reference]
17//! instead (as in, `&amp;`, `&#123;`, or say `&#x9;`).
18//!
19//! It is also possible to escape a line ending in text with a similar
20//! construct: a [hard break (escape)][hard_break_escape] is a backslash followed
21//! by a line ending (that is part of the construct instead of ending it).
22//!
23//! ## Recommendation
24//!
25//! If possible, use a character escape.
26//! Otherwise, use a character reference.
27//!
28//! ## Tokens
29//!
30//! * [`CharacterEscape`][Name::CharacterEscape]
31//! * [`CharacterEscapeMarker`][Name::CharacterEscapeMarker]
32//! * [`CharacterEscapeValue`][Name::CharacterEscapeValue]
33//!
34//! ## References
35//!
36//! * [`character-escape.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-core-commonmark/dev/lib/character-escape.js)
37//! * [*§ 2.4 Backslash escapes* in `CommonMark`](https://spec.commonmark.org/0.31/#backslash-escapes)
38//!
39//! [string]: crate::construct::string
40//! [text]: crate::construct::text
41//! [character_reference]: crate::construct::character_reference
42//! [hard_break_escape]: crate::construct::hard_break_escape
43
44use crate::event::Name;
45use crate::state::{Name as StateName, State};
46use crate::tokenizer::Tokenizer;
47
48/// Start of character escape.
49///
50/// ```markdown
51/// > | a\*b
52///      ^
53/// ```
54pub fn start(tokenizer: &mut Tokenizer) -> State {
55    if tokenizer.parse_state.options.constructs.character_escape && tokenizer.current == Some(b'\\')
56    {
57        tokenizer.enter(Name::CharacterEscape);
58        tokenizer.enter(Name::CharacterEscapeMarker);
59        tokenizer.consume();
60        tokenizer.exit(Name::CharacterEscapeMarker);
61        State::Next(StateName::CharacterEscapeInside)
62    } else {
63        State::Nok
64    }
65}
66
67/// After `\`, at punctuation.
68///
69/// ```markdown
70/// > | a\*b
71///       ^
72/// ```
73pub fn inside(tokenizer: &mut Tokenizer) -> State {
74    match tokenizer.current {
75        // ASCII punctuation.
76        Some(b'!'..=b'/' | b':'..=b'@' | b'['..=b'`' | b'{'..=b'~') => {
77            tokenizer.enter(Name::CharacterEscapeValue);
78            tokenizer.consume();
79            tokenizer.exit(Name::CharacterEscapeValue);
80            tokenizer.exit(Name::CharacterEscape);
81            State::Ok
82        }
83        _ => State::Nok,
84    }
85}