src/construct/html_flow.rs at hack · crashkeys.dev/markdown-rs

crashkeys.dev / markdown-rs
fork atom
Markdown parser fork with extended syntax for personal use.
fork atom
markdown-rs / src / construct / html_flow.rs
at hack 873 lines 28 kB view raw
wrap content
Titus Wormer Refactor docs 11mo ago
e0ca3f6c
  1//! HTML (flow) occurs in the [flow][] content type.
  2//!
  3//! ## Grammar
  4//!
  5//! HTML (flow) forms with the following BNF
  6//! (<small>see [construct][crate::construct] for character groups</small>):
  7//!
  8//! ```bnf
  9//! html_flow ::= raw | comment | instruction | declaration | cdata | basic | complete
 10//!
 11//! ; Note: closing tag name does not need to match opening tag name.
 12//! raw ::= '<' raw_tag_name [[space_or_tab *line | '>' *line] eol] *(*line eol) ['</' raw_tag_name *line]
 13//! comment ::= '<!--' [*'-' '>' *line | *line *(eol *line) ['-->' *line]]
 14//! instruction ::= '<?' ['>' *line | *line *(eol *line) ['?>' *line]]
 15//! declaration ::= '<!' ascii_alphabetic *line *(eol *line) ['>' *line]
 16//! cdata ::= '<![CDATA[' *line *(eol *line) [']]>' *line]
 17//! basic ::= '< ['/'] basic_tag_name [['/'] '>' *line *(eol 1*line)]
 18//! complete ::= (opening_tag | closing_tag) [*space_or_tab *(eol 1*line)]
 19//!
 20//! raw_tag_name ::= 'pre' | 'script' | 'style' | 'textarea' ; Note: case-insensitive.
 21//! basic_tag_name ::= 'address' | 'article' | 'aside' | ... ; See `constants.rs`, and note: case-insensitive.
 22//! opening_tag ::= '<' tag_name *(1*space_or_tab attribute) [*space_or_tab '/'] *space_or_tab '>'
 23//! closing_tag ::= '</' tag_name *space_or_tab '>'
 24//! tag_name ::= ascii_alphabetic *('-' | ascii_alphanumeric)
 25//! attribute ::= attribute_name [*space_or_tab '=' *space_or_tab attribute_value]
 26//! attribute_name ::= (':' | '_' | ascii_alphabetic) *('-' | '.' | ':' | '_' | ascii_alphanumeric)
 27//! attribute_value ::= '"' *(line - '"') '"' | "'" *(line - "'")  "'" | 1*(text - '"' - "'" - '/' - '<' - '=' - '>' - '`')
 28//! ```
 29//!
 30//! As this construct occurs in flow, like all flow constructs, it must be
 31//! followed by an eol (line ending) or eof (end of file).
 32//!
 33//! The grammar for HTML in markdown does not follow the rules of parsing
 34//! HTML according to the [*§ 13.2 Parsing HTML documents* in the HTML
 35//! spec][html_parsing].
 36//! As such, HTML in markdown *resembles* HTML, but is instead a (naïve?)
 37//! attempt to parse an XML-like language.
 38//! By extension, another notable property of the grammar is that it can
 39//! result in invalid HTML, in that it allows things that wouldn’t work or
 40//! wouldn’t work well in HTML, such as mismatched tags.
 41//!
 42//! Interestingly, most of the productions above have a clear opening and
 43//! closing condition (raw, comment, insutrction, declaration, cdata), but the
 44//! closing condition does not need to be satisfied.
 45//! In this case, the parser never has to backtrack.
 46//!
 47//! Because the **basic** and **complete** productions in the grammar form with
 48//! a tag, followed by more stuff, and stop at a blank line, it is possible to
 49//! interleave (a word for switching between languages) markdown and HTML
 50//! together, by placing the opening and closing tags on their own lines,
 51//! with blank lines between them and markdown.
 52//! For example:
 53//!
 54//! ```markdown
 55//! <div>This is <code>code</code> but this is not *emphasis*.</div>
 56//!
 57//! <div>
 58//!
 59//! This is a paragraph in a `div` and with `code` and *emphasis*.
 60//!
 61//! </div>
 62//! ```
 63//!
 64//! The **complete** production of HTML (flow) is not allowed to interrupt
 65//! content.
 66//! That means that a blank line is needed between a [paragraph][] and it.
 67//! However, [HTML (text)][html_text] has a similar production, which will
 68//! typically kick-in instead.
 69//!
 70//! The list of tag names allowed in the **raw** production are defined in
 71//! [`HTML_RAW_NAMES`][].
 72//! This production exists because there are a few cases where markdown
 73//! *inside* some elements, and hence interleaving, does not make sense.
 74//!
 75//! The list of tag names allowed in the **basic** production are defined in
 76//! [`HTML_BLOCK_NAMES`][].
 77//! This production exists because there are a few cases where we can decide
 78//! early that something is going to be a flow (block) element instead of a
 79//! phrasing (inline) element.
 80//! We *can* interrupt and don’t have to care too much about it being
 81//! well-formed.
 82//!
 83//! ## Tokens
 84//!
 85//! * [`HtmlFlow`][Name::HtmlFlow]
 86//! * [`HtmlFlowData`][Name::HtmlFlowData]
 87//! * [`LineEnding`][Name::LineEnding]
 88//!
 89//! ## References
 90//!
 91//! * [`html-flow.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-core-commonmark/dev/lib/html-flow.js)
 92//! * [*§ 4.6 HTML blocks* in `CommonMark`](https://spec.commonmark.org/0.31/#html-blocks)
 93//!
 94//! [flow]: crate::construct::flow
 95//! [html_text]: crate::construct::html_text
 96//! [paragraph]: crate::construct::paragraph
 97//! [html_raw_names]: crate::util::constant::HTML_RAW_NAMES
 98//! [html_block_names]: crate::util::constant::HTML_BLOCK_NAMES
 99//! [html_parsing]: https://html.spec.whatwg.org/multipage/parsing.html#parsing
100
101use crate::construct::partial_space_or_tab::{
102    space_or_tab_with_options, Options as SpaceOrTabOptions,
103};
104use crate::event::Name;
105use crate::state::{Name as StateName, State};
106use crate::tokenizer::Tokenizer;
107use crate::util::{
108    constant::{HTML_BLOCK_NAMES, HTML_CDATA_PREFIX, HTML_RAW_NAMES, HTML_RAW_SIZE_MAX, TAB_SIZE},
109    slice::Slice,
110};
111
112/// Symbol for `<script>` (condition 1).
113const RAW: u8 = 1;
114/// Symbol for `<!---->` (condition 2).
115const COMMENT: u8 = 2;
116/// Symbol for `<?php?>` (condition 3).
117const INSTRUCTION: u8 = 3;
118/// Symbol for `<!doctype>` (condition 4).
119const DECLARATION: u8 = 4;
120/// Symbol for `<![CDATA[]]>` (condition 5).
121const CDATA: u8 = 5;
122/// Symbol for `<div` (condition 6).
123const BASIC: u8 = 6;
124/// Symbol for `<x>` (condition 7).
125const COMPLETE: u8 = 7;
126
127/// Start of HTML (flow).
128///
129/// ```markdown
130/// > | <x />
131///     ^
132/// ```
133pub fn start(tokenizer: &mut Tokenizer) -> State {
134    if tokenizer.parse_state.options.constructs.html_flow {
135        tokenizer.enter(Name::HtmlFlow);
136
137        if matches!(tokenizer.current, Some(b'\t' | b' ')) {
138            tokenizer.attempt(State::Next(StateName::HtmlFlowBefore), State::Nok);
139            State::Retry(space_or_tab_with_options(
140                tokenizer,
141                SpaceOrTabOptions {
142                    kind: Name::HtmlFlowData,
143                    min: 0,
144                    max: if tokenizer.parse_state.options.constructs.code_indented {
145                        TAB_SIZE - 1
146                    } else {
147                        usize::MAX
148                    },
149                    connect: false,
150                    content: None,
151                },
152            ))
153        } else {
154            State::Retry(StateName::HtmlFlowBefore)
155        }
156    } else {
157        State::Nok
158    }
159}
160
161/// At `<`, after optional whitespace.
162///
163/// ```markdown
164/// > | <x />
165///     ^
166/// ```
167pub fn before(tokenizer: &mut Tokenizer) -> State {
168    if Some(b'<') == tokenizer.current {
169        tokenizer.enter(Name::HtmlFlowData);
170        tokenizer.consume();
171        State::Next(StateName::HtmlFlowOpen)
172    } else {
173        State::Nok
174    }
175}
176
177/// After `<`, at tag name or other stuff.
178///
179/// ```markdown
180/// > | <x />
181///      ^
182/// > | <!doctype>
183///      ^
184/// > | <!--xxx-->
185///      ^
186/// ```
187pub fn open(tokenizer: &mut Tokenizer) -> State {
188    match tokenizer.current {
189        Some(b'!') => {
190            tokenizer.consume();
191            State::Next(StateName::HtmlFlowDeclarationOpen)
192        }
193        Some(b'/') => {
194            tokenizer.consume();
195            tokenizer.tokenize_state.seen = true;
196            tokenizer.tokenize_state.start = tokenizer.point.index;
197            State::Next(StateName::HtmlFlowTagCloseStart)
198        }
199        Some(b'?') => {
200            tokenizer.consume();
201            tokenizer.tokenize_state.marker = INSTRUCTION;
202            // Do not form containers.
203            tokenizer.concrete = true;
204            // While we’re in an instruction instead of a declaration, we’re on a `?`
205            // right now, so we do need to search for `>`, similar to declarations.
206            State::Next(StateName::HtmlFlowContinuationDeclarationInside)
207        }
208        // ASCII alphabetical.
209        Some(b'A'..=b'Z' | b'a'..=b'z') => {
210            tokenizer.tokenize_state.start = tokenizer.point.index;
211            State::Retry(StateName::HtmlFlowTagName)
212        }
213        _ => State::Nok,
214    }
215}
216
217/// After `<!`, at declaration, comment, or CDATA.
218///
219/// ```markdown
220/// > | <!doctype>
221///       ^
222/// > | <!--xxx-->
223///       ^
224/// > | <![CDATA[>&<]]>
225///       ^
226/// ```
227pub fn declaration_open(tokenizer: &mut Tokenizer) -> State {
228    match tokenizer.current {
229        Some(b'-') => {
230            tokenizer.consume();
231            tokenizer.tokenize_state.marker = COMMENT;
232            State::Next(StateName::HtmlFlowCommentOpenInside)
233        }
234        Some(b'A'..=b'Z' | b'a'..=b'z') => {
235            tokenizer.consume();
236            tokenizer.tokenize_state.marker = DECLARATION;
237            // Do not form containers.
238            tokenizer.concrete = true;
239            State::Next(StateName::HtmlFlowContinuationDeclarationInside)
240        }
241        Some(b'[') => {
242            tokenizer.consume();
243            tokenizer.tokenize_state.marker = CDATA;
244            State::Next(StateName::HtmlFlowCdataOpenInside)
245        }
246        _ => State::Nok,
247    }
248}
249
250/// After `<!-`, inside a comment, at another `-`.
251///
252/// ```markdown
253/// > | <!--xxx-->
254///        ^
255/// ```
256pub fn comment_open_inside(tokenizer: &mut Tokenizer) -> State {
257    if let Some(b'-') = tokenizer.current {
258        tokenizer.consume();
259        // Do not form containers.
260        tokenizer.concrete = true;
261        State::Next(StateName::HtmlFlowContinuationDeclarationInside)
262    } else {
263        tokenizer.tokenize_state.marker = 0;
264        State::Nok
265    }
266}
267
268/// After `<![`, inside CDATA, expecting `CDATA[`.
269///
270/// ```markdown
271/// > | <![CDATA[>&<]]>
272///        ^^^^^^
273/// ```
274pub fn cdata_open_inside(tokenizer: &mut Tokenizer) -> State {
275    if tokenizer.current == Some(HTML_CDATA_PREFIX[tokenizer.tokenize_state.size]) {
276        tokenizer.consume();
277        tokenizer.tokenize_state.size += 1;
278
279        if tokenizer.tokenize_state.size == HTML_CDATA_PREFIX.len() {
280            tokenizer.tokenize_state.size = 0;
281            // Do not form containers.
282            tokenizer.concrete = true;
283            State::Next(StateName::HtmlFlowContinuation)
284        } else {
285            State::Next(StateName::HtmlFlowCdataOpenInside)
286        }
287    } else {
288        tokenizer.tokenize_state.marker = 0;
289        tokenizer.tokenize_state.size = 0;
290        State::Nok
291    }
292}
293
294/// After `</`, in closing tag, at tag name.
295///
296/// ```markdown
297/// > | </x>
298///       ^
299/// ```
300pub fn tag_close_start(tokenizer: &mut Tokenizer) -> State {
301    if let Some(b'A'..=b'Z' | b'a'..=b'z') = tokenizer.current {
302        tokenizer.consume();
303        State::Next(StateName::HtmlFlowTagName)
304    } else {
305        tokenizer.tokenize_state.seen = false;
306        tokenizer.tokenize_state.start = 0;
307        State::Nok
308    }
309}
310
311/// In tag name.
312///
313/// ```markdown
314/// > | <ab>
315///      ^^
316/// > | </ab>
317///       ^^
318/// ```
319pub fn tag_name(tokenizer: &mut Tokenizer) -> State {
320    match tokenizer.current {
321        None | Some(b'\t' | b'\n' | b' ' | b'/' | b'>') => {
322            let closing_tag = tokenizer.tokenize_state.seen;
323            let slash = matches!(tokenizer.current, Some(b'/'));
324            // Guaranteed to be valid ASCII bytes.
325            let slice = Slice::from_indices(
326                tokenizer.parse_state.bytes,
327                tokenizer.tokenize_state.start,
328                tokenizer.point.index,
329            );
330            let name = slice
331                .as_str()
332                // The line ending case might result in a `\r` that is already accounted for.
333                .trim()
334                .to_ascii_lowercase();
335            tokenizer.tokenize_state.seen = false;
336            tokenizer.tokenize_state.start = 0;
337
338            if !slash && !closing_tag && HTML_RAW_NAMES.contains(&name.as_str()) {
339                tokenizer.tokenize_state.marker = RAW;
340                // Do not form containers.
341                tokenizer.concrete = true;
342                State::Retry(StateName::HtmlFlowContinuation)
343            } else if HTML_BLOCK_NAMES.contains(&name.as_str()) {
344                tokenizer.tokenize_state.marker = BASIC;
345
346                if slash {
347                    tokenizer.consume();
348                    State::Next(StateName::HtmlFlowBasicSelfClosing)
349                } else {
350                    // Do not form containers.
351                    tokenizer.concrete = true;
352                    State::Retry(StateName::HtmlFlowContinuation)
353                }
354            } else {
355                tokenizer.tokenize_state.marker = COMPLETE;
356
357                // Do not support complete HTML when interrupting.
358                if tokenizer.interrupt && !tokenizer.lazy {
359                    tokenizer.tokenize_state.marker = 0;
360                    State::Nok
361                } else if closing_tag {
362                    State::Retry(StateName::HtmlFlowCompleteClosingTagAfter)
363                } else {
364                    State::Retry(StateName::HtmlFlowCompleteAttributeNameBefore)
365                }
366            }
367        }
368        // ASCII alphanumerical and `-`.
369        Some(b'-' | b'0'..=b'9' | b'A'..=b'Z' | b'a'..=b'z') => {
370            tokenizer.consume();
371            State::Next(StateName::HtmlFlowTagName)
372        }
373        Some(_) => {
374            tokenizer.tokenize_state.seen = false;
375            State::Nok
376        }
377    }
378}
379
380/// After closing slash of a basic tag name.
381///
382/// ```markdown
383/// > | <div/>
384///          ^
385/// ```
386pub fn basic_self_closing(tokenizer: &mut Tokenizer) -> State {
387    if let Some(b'>') = tokenizer.current {
388        tokenizer.consume();
389        // Do not form containers.
390        tokenizer.concrete = true;
391        State::Next(StateName::HtmlFlowContinuation)
392    } else {
393        tokenizer.tokenize_state.marker = 0;
394        State::Nok
395    }
396}
397
398/// After closing slash of a complete tag name.
399///
400/// ```markdown
401/// > | <x/>
402///        ^
403/// ```
404pub fn complete_closing_tag_after(tokenizer: &mut Tokenizer) -> State {
405    match tokenizer.current {
406        Some(b'\t' | b' ') => {
407            tokenizer.consume();
408            State::Next(StateName::HtmlFlowCompleteClosingTagAfter)
409        }
410        _ => State::Retry(StateName::HtmlFlowCompleteEnd),
411    }
412}
413
414/// At an attribute name.
415///
416/// At first, this state is used after a complete tag name, after whitespace,
417/// where it expects optional attributes or the end of the tag.
418/// It is also reused after attributes, when expecting more optional
419/// attributes.
420///
421/// ```markdown
422/// > | <a />
423///        ^
424/// > | <a :b>
425///        ^
426/// > | <a _b>
427///        ^
428/// > | <a b>
429///        ^
430/// > | <a >
431///        ^
432/// ```
433pub fn complete_attribute_name_before(tokenizer: &mut Tokenizer) -> State {
434    match tokenizer.current {
435        Some(b'\t' | b' ') => {
436            tokenizer.consume();
437            State::Next(StateName::HtmlFlowCompleteAttributeNameBefore)
438        }
439        Some(b'/') => {
440            tokenizer.consume();
441            State::Next(StateName::HtmlFlowCompleteEnd)
442        }
443        // ASCII alphanumerical and `:` and `_`.
444        Some(b'0'..=b'9' | b':' | b'A'..=b'Z' | b'_' | b'a'..=b'z') => {
445            tokenizer.consume();
446            State::Next(StateName::HtmlFlowCompleteAttributeName)
447        }
448        _ => State::Retry(StateName::HtmlFlowCompleteEnd),
449    }
450}
451
452/// In attribute name.
453///
454/// ```markdown
455/// > | <a :b>
456///         ^
457/// > | <a _b>
458///         ^
459/// > | <a b>
460///         ^
461/// ```
462pub fn complete_attribute_name(tokenizer: &mut Tokenizer) -> State {
463    match tokenizer.current {
464        // ASCII alphanumerical and `-`, `.`, `:`, and `_`.
465        Some(b'-' | b'.' | b'0'..=b'9' | b':' | b'A'..=b'Z' | b'_' | b'a'..=b'z') => {
466            tokenizer.consume();
467            State::Next(StateName::HtmlFlowCompleteAttributeName)
468        }
469        _ => State::Retry(StateName::HtmlFlowCompleteAttributeNameAfter),
470    }
471}
472
473/// After attribute name, at an optional initializer, the end of the tag, or
474/// whitespace.
475///
476/// ```markdown
477/// > | <a b>
478///         ^
479/// > | <a b=c>
480///         ^
481/// ```
482pub fn complete_attribute_name_after(tokenizer: &mut Tokenizer) -> State {
483    match tokenizer.current {
484        Some(b'\t' | b' ') => {
485            tokenizer.consume();
486            State::Next(StateName::HtmlFlowCompleteAttributeNameAfter)
487        }
488        Some(b'=') => {
489            tokenizer.consume();
490            State::Next(StateName::HtmlFlowCompleteAttributeValueBefore)
491        }
492        _ => State::Retry(StateName::HtmlFlowCompleteAttributeNameBefore),
493    }
494}
495
496/// Before unquoted, double quoted, or single quoted attribute value, allowing
497/// whitespace.
498///
499/// ```markdown
500/// > | <a b=c>
501///          ^
502/// > | <a b="c">
503///          ^
504/// ```
505pub fn complete_attribute_value_before(tokenizer: &mut Tokenizer) -> State {
506    match tokenizer.current {
507        None | Some(b'<' | b'=' | b'>' | b'`') => {
508            tokenizer.tokenize_state.marker = 0;
509            State::Nok
510        }
511        Some(b'\t' | b' ') => {
512            tokenizer.consume();
513            State::Next(StateName::HtmlFlowCompleteAttributeValueBefore)
514        }
515        Some(b'"' | b'\'') => {
516            tokenizer.tokenize_state.marker_b = tokenizer.current.unwrap();
517            tokenizer.consume();
518            State::Next(StateName::HtmlFlowCompleteAttributeValueQuoted)
519        }
520        _ => State::Retry(StateName::HtmlFlowCompleteAttributeValueUnquoted),
521    }
522}
523
524/// In double or single quoted attribute value.
525///
526/// ```markdown
527/// > | <a b="c">
528///           ^
529/// > | <a b='c'>
530///           ^
531/// ```
532pub fn complete_attribute_value_quoted(tokenizer: &mut Tokenizer) -> State {
533    if tokenizer.current == Some(tokenizer.tokenize_state.marker_b) {
534        tokenizer.consume();
535        tokenizer.tokenize_state.marker_b = 0;
536        State::Next(StateName::HtmlFlowCompleteAttributeValueQuotedAfter)
537    } else if matches!(tokenizer.current, None | Some(b'\n')) {
538        tokenizer.tokenize_state.marker = 0;
539        tokenizer.tokenize_state.marker_b = 0;
540        State::Nok
541    } else {
542        tokenizer.consume();
543        State::Next(StateName::HtmlFlowCompleteAttributeValueQuoted)
544    }
545}
546
547/// In unquoted attribute value.
548///
549/// ```markdown
550/// > | <a b=c>
551///          ^
552/// ```
553pub fn complete_attribute_value_unquoted(tokenizer: &mut Tokenizer) -> State {
554    match tokenizer.current {
555        None | Some(b'\t' | b'\n' | b' ' | b'"' | b'\'' | b'/' | b'<' | b'=' | b'>' | b'`') => {
556            State::Retry(StateName::HtmlFlowCompleteAttributeNameAfter)
557        }
558        Some(_) => {
559            tokenizer.consume();
560            State::Next(StateName::HtmlFlowCompleteAttributeValueUnquoted)
561        }
562    }
563}
564
565/// After double or single quoted attribute value, before whitespace or the
566/// end of the tag.
567///
568/// ```markdown
569/// > | <a b="c">
570///            ^
571/// ```
572pub fn complete_attribute_value_quoted_after(tokenizer: &mut Tokenizer) -> State {
573    if let Some(b'\t' | b' ' | b'/' | b'>') = tokenizer.current {
574        State::Retry(StateName::HtmlFlowCompleteAttributeNameBefore)
575    } else {
576        tokenizer.tokenize_state.marker = 0;
577        State::Nok
578    }
579}
580
581/// In certain circumstances of a complete tag where only an `>` is allowed.
582///
583/// ```markdown
584/// > | <a b="c">
585///             ^
586/// ```
587pub fn complete_end(tokenizer: &mut Tokenizer) -> State {
588    if let Some(b'>') = tokenizer.current {
589        tokenizer.consume();
590        State::Next(StateName::HtmlFlowCompleteAfter)
591    } else {
592        tokenizer.tokenize_state.marker = 0;
593        State::Nok
594    }
595}
596
597/// After `>` in a complete tag.
598///
599/// ```markdown
600/// > | <x>
601///        ^
602/// ```
603pub fn complete_after(tokenizer: &mut Tokenizer) -> State {
604    match tokenizer.current {
605        None | Some(b'\n') => {
606            // Do not form containers.
607            tokenizer.concrete = true;
608            State::Retry(StateName::HtmlFlowContinuation)
609        }
610        Some(b'\t' | b' ') => {
611            tokenizer.consume();
612            State::Next(StateName::HtmlFlowCompleteAfter)
613        }
614        Some(_) => {
615            tokenizer.tokenize_state.marker = 0;
616            State::Nok
617        }
618    }
619}
620
621/// In continuation of any HTML kind.
622///
623/// ```markdown
624/// > | <!--xxx-->
625///          ^
626/// ```
627pub fn continuation(tokenizer: &mut Tokenizer) -> State {
628    if tokenizer.tokenize_state.marker == COMMENT && tokenizer.current == Some(b'-') {
629        tokenizer.consume();
630        State::Next(StateName::HtmlFlowContinuationCommentInside)
631    } else if tokenizer.tokenize_state.marker == RAW && tokenizer.current == Some(b'<') {
632        tokenizer.consume();
633        State::Next(StateName::HtmlFlowContinuationRawTagOpen)
634    } else if tokenizer.tokenize_state.marker == DECLARATION && tokenizer.current == Some(b'>') {
635        tokenizer.consume();
636        State::Next(StateName::HtmlFlowContinuationClose)
637    } else if tokenizer.tokenize_state.marker == INSTRUCTION && tokenizer.current == Some(b'?') {
638        tokenizer.consume();
639        State::Next(StateName::HtmlFlowContinuationDeclarationInside)
640    } else if tokenizer.tokenize_state.marker == CDATA && tokenizer.current == Some(b']') {
641        tokenizer.consume();
642        State::Next(StateName::HtmlFlowContinuationCdataInside)
643    } else if matches!(tokenizer.tokenize_state.marker, BASIC | COMPLETE)
644        && tokenizer.current == Some(b'\n')
645    {
646        tokenizer.exit(Name::HtmlFlowData);
647        tokenizer.check(
648            State::Next(StateName::HtmlFlowContinuationAfter),
649            State::Next(StateName::HtmlFlowContinuationStart),
650        );
651        State::Retry(StateName::HtmlFlowBlankLineBefore)
652    } else if matches!(tokenizer.current, None | Some(b'\n')) {
653        tokenizer.exit(Name::HtmlFlowData);
654        State::Retry(StateName::HtmlFlowContinuationStart)
655    } else {
656        tokenizer.consume();
657        State::Next(StateName::HtmlFlowContinuation)
658    }
659}
660
661/// In continuation, at eol.
662///
663/// ```markdown
664/// > | <x>
665///        ^
666///   | asd
667/// ```
668pub fn continuation_start(tokenizer: &mut Tokenizer) -> State {
669    tokenizer.check(
670        State::Next(StateName::HtmlFlowContinuationStartNonLazy),
671        State::Next(StateName::HtmlFlowContinuationAfter),
672    );
673    State::Retry(StateName::NonLazyContinuationStart)
674}
675
676/// In continuation, at eol, before non-lazy content.
677///
678/// ```markdown
679/// > | <x>
680///        ^
681///   | asd
682/// ```
683pub fn continuation_start_non_lazy(tokenizer: &mut Tokenizer) -> State {
684    match tokenizer.current {
685        Some(b'\n') => {
686            tokenizer.enter(Name::LineEnding);
687            tokenizer.consume();
688            tokenizer.exit(Name::LineEnding);
689            State::Next(StateName::HtmlFlowContinuationBefore)
690        }
691        _ => unreachable!("expected eol"),
692    }
693}
694
695/// In continuation, before non-lazy content.
696///
697/// ```markdown
698///   | <x>
699/// > | asd
700///     ^
701/// ```
702pub fn continuation_before(tokenizer: &mut Tokenizer) -> State {
703    match tokenizer.current {
704        None | Some(b'\n') => State::Retry(StateName::HtmlFlowContinuationStart),
705        _ => {
706            tokenizer.enter(Name::HtmlFlowData);
707            State::Retry(StateName::HtmlFlowContinuation)
708        }
709    }
710}
711
712/// In comment continuation, after one `-`, expecting another.
713///
714/// ```markdown
715/// > | <!--xxx-->
716///             ^
717/// ```
718pub fn continuation_comment_inside(tokenizer: &mut Tokenizer) -> State {
719    match tokenizer.current {
720        Some(b'-') => {
721            tokenizer.consume();
722            State::Next(StateName::HtmlFlowContinuationDeclarationInside)
723        }
724        _ => State::Retry(StateName::HtmlFlowContinuation),
725    }
726}
727
728/// In raw continuation, after `<`, at `/`.
729///
730/// ```markdown
731/// > | <script>console.log(1)</script>
732///                            ^
733/// ```
734pub fn continuation_raw_tag_open(tokenizer: &mut Tokenizer) -> State {
735    match tokenizer.current {
736        Some(b'/') => {
737            tokenizer.consume();
738            tokenizer.tokenize_state.start = tokenizer.point.index;
739            State::Next(StateName::HtmlFlowContinuationRawEndTag)
740        }
741        _ => State::Retry(StateName::HtmlFlowContinuation),
742    }
743}
744
745/// In raw continuation, after `</`, in a raw tag name.
746///
747/// ```markdown
748/// > | <script>console.log(1)</script>
749///                             ^^^^^^
750/// ```
751pub fn continuation_raw_end_tag(tokenizer: &mut Tokenizer) -> State {
752    match tokenizer.current {
753        Some(b'>') => {
754            // Guaranteed to be valid ASCII bytes.
755            let slice = Slice::from_indices(
756                tokenizer.parse_state.bytes,
757                tokenizer.tokenize_state.start,
758                tokenizer.point.index,
759            );
760            let name = slice.as_str().to_ascii_lowercase();
761
762            tokenizer.tokenize_state.start = 0;
763
764            if HTML_RAW_NAMES.contains(&name.as_str()) {
765                tokenizer.consume();
766                State::Next(StateName::HtmlFlowContinuationClose)
767            } else {
768                State::Retry(StateName::HtmlFlowContinuation)
769            }
770        }
771        Some(b'A'..=b'Z' | b'a'..=b'z')
772            if tokenizer.point.index - tokenizer.tokenize_state.start < HTML_RAW_SIZE_MAX =>
773        {
774            tokenizer.consume();
775            State::Next(StateName::HtmlFlowContinuationRawEndTag)
776        }
777        _ => {
778            tokenizer.tokenize_state.start = 0;
779            State::Retry(StateName::HtmlFlowContinuation)
780        }
781    }
782}
783
784/// In cdata continuation, after `]`, expecting `]>`.
785///
786/// ```markdown
787/// > | <![CDATA[>&<]]>
788///                  ^
789/// ```
790pub fn continuation_cdata_inside(tokenizer: &mut Tokenizer) -> State {
791    match tokenizer.current {
792        Some(b']') => {
793            tokenizer.consume();
794            State::Next(StateName::HtmlFlowContinuationDeclarationInside)
795        }
796        _ => State::Retry(StateName::HtmlFlowContinuation),
797    }
798}
799
800/// In declaration or instruction continuation, at `>`.
801///
802/// ```markdown
803/// > | <!-->
804///         ^
805/// > | <?>
806///       ^
807/// > | <!q>
808///        ^
809/// > | <!--ab-->
810///             ^
811/// > | <![CDATA[>&<]]>
812///                   ^
813/// ```
814pub fn continuation_declaration_inside(tokenizer: &mut Tokenizer) -> State {
815    if tokenizer.tokenize_state.marker == COMMENT && tokenizer.current == Some(b'-') {
816        tokenizer.consume();
817        State::Next(StateName::HtmlFlowContinuationDeclarationInside)
818    } else if tokenizer.current == Some(b'>') {
819        tokenizer.consume();
820        State::Next(StateName::HtmlFlowContinuationClose)
821    } else {
822        State::Retry(StateName::HtmlFlowContinuation)
823    }
824}
825
826/// In closed continuation: everything we get until the eol/eof is part of it.
827///
828/// ```markdown
829/// > | <!doctype>
830///               ^
831/// ```
832pub fn continuation_close(tokenizer: &mut Tokenizer) -> State {
833    match tokenizer.current {
834        None | Some(b'\n') => {
835            tokenizer.exit(Name::HtmlFlowData);
836            State::Retry(StateName::HtmlFlowContinuationAfter)
837        }
838        _ => {
839            tokenizer.consume();
840            State::Next(StateName::HtmlFlowContinuationClose)
841        }
842    }
843}
844
845/// Done.
846///
847/// ```markdown
848/// > | <!doctype>
849///               ^
850/// ```
851pub fn continuation_after(tokenizer: &mut Tokenizer) -> State {
852    tokenizer.exit(Name::HtmlFlow);
853    tokenizer.tokenize_state.marker = 0;
854    // Feel free to interrupt.
855    tokenizer.interrupt = false;
856    // No longer concrete.
857    tokenizer.concrete = false;
858    State::Ok
859}
860
861/// Before eol, expecting blank line.
862///
863/// ```markdown
864/// > | <div>
865///          ^
866///   |
867/// ```
868pub fn blank_line_before(tokenizer: &mut Tokenizer) -> State {
869    tokenizer.enter(Name::LineEnding);
870    tokenizer.consume();
871    tokenizer.exit(Name::LineEnding);
872    State::Next(StateName::BlankLineStart)
873}