Markdown parser fork with extended syntax for personal use.
1//! HTML (flow) occurs in the [flow][] content type.
2//!
3//! ## Grammar
4//!
5//! HTML (flow) forms with the following BNF
6//! (<small>see [construct][crate::construct] for character groups</small>):
7//!
8//! ```bnf
9//! html_flow ::= raw | comment | instruction | declaration | cdata | basic | complete
10//!
11//! ; Note: closing tag name does not need to match opening tag name.
12//! raw ::= '<' raw_tag_name [[space_or_tab *line | '>' *line] eol] *(*line eol) ['</' raw_tag_name *line]
13//! comment ::= '<!--' [*'-' '>' *line | *line *(eol *line) ['-->' *line]]
14//! instruction ::= '<?' ['>' *line | *line *(eol *line) ['?>' *line]]
15//! declaration ::= '<!' ascii_alphabetic *line *(eol *line) ['>' *line]
16//! cdata ::= '<![CDATA[' *line *(eol *line) [']]>' *line]
17//! basic ::= '< ['/'] basic_tag_name [['/'] '>' *line *(eol 1*line)]
18//! complete ::= (opening_tag | closing_tag) [*space_or_tab *(eol 1*line)]
19//!
20//! raw_tag_name ::= 'pre' | 'script' | 'style' | 'textarea' ; Note: case-insensitive.
21//! basic_tag_name ::= 'address' | 'article' | 'aside' | ... ; See `constants.rs`, and note: case-insensitive.
22//! opening_tag ::= '<' tag_name *(1*space_or_tab attribute) [*space_or_tab '/'] *space_or_tab '>'
23//! closing_tag ::= '</' tag_name *space_or_tab '>'
24//! tag_name ::= ascii_alphabetic *('-' | ascii_alphanumeric)
25//! attribute ::= attribute_name [*space_or_tab '=' *space_or_tab attribute_value]
26//! attribute_name ::= (':' | '_' | ascii_alphabetic) *('-' | '.' | ':' | '_' | ascii_alphanumeric)
27//! attribute_value ::= '"' *(line - '"') '"' | "'" *(line - "'") "'" | 1*(text - '"' - "'" - '/' - '<' - '=' - '>' - '`')
28//! ```
29//!
30//! As this construct occurs in flow, like all flow constructs, it must be
31//! followed by an eol (line ending) or eof (end of file).
32//!
33//! The grammar for HTML in markdown does not follow the rules of parsing
34//! HTML according to the [*§ 13.2 Parsing HTML documents* in the HTML
35//! spec][html_parsing].
36//! As such, HTML in markdown *resembles* HTML, but is instead a (naïve?)
37//! attempt to parse an XML-like language.
38//! By extension, another notable property of the grammar is that it can
39//! result in invalid HTML, in that it allows things that wouldn’t work or
40//! wouldn’t work well in HTML, such as mismatched tags.
41//!
42//! Interestingly, most of the productions above have a clear opening and
43//! closing condition (raw, comment, insutrction, declaration, cdata), but the
44//! closing condition does not need to be satisfied.
45//! In this case, the parser never has to backtrack.
46//!
47//! Because the **basic** and **complete** productions in the grammar form with
48//! a tag, followed by more stuff, and stop at a blank line, it is possible to
49//! interleave (a word for switching between languages) markdown and HTML
50//! together, by placing the opening and closing tags on their own lines,
51//! with blank lines between them and markdown.
52//! For example:
53//!
54//! ```markdown
55//! <div>This is <code>code</code> but this is not *emphasis*.</div>
56//!
57//! <div>
58//!
59//! This is a paragraph in a `div` and with `code` and *emphasis*.
60//!
61//! </div>
62//! ```
63//!
64//! The **complete** production of HTML (flow) is not allowed to interrupt
65//! content.
66//! That means that a blank line is needed between a [paragraph][] and it.
67//! However, [HTML (text)][html_text] has a similar production, which will
68//! typically kick-in instead.
69//!
70//! The list of tag names allowed in the **raw** production are defined in
71//! [`HTML_RAW_NAMES`][].
72//! This production exists because there are a few cases where markdown
73//! *inside* some elements, and hence interleaving, does not make sense.
74//!
75//! The list of tag names allowed in the **basic** production are defined in
76//! [`HTML_BLOCK_NAMES`][].
77//! This production exists because there are a few cases where we can decide
78//! early that something is going to be a flow (block) element instead of a
79//! phrasing (inline) element.
80//! We *can* interrupt and don’t have to care too much about it being
81//! well-formed.
82//!
83//! ## Tokens
84//!
85//! * [`HtmlFlow`][Name::HtmlFlow]
86//! * [`HtmlFlowData`][Name::HtmlFlowData]
87//! * [`LineEnding`][Name::LineEnding]
88//!
89//! ## References
90//!
91//! * [`html-flow.js` in `micromark`](https://github.com/micromark/micromark/blob/main/packages/micromark-core-commonmark/dev/lib/html-flow.js)
92//! * [*§ 4.6 HTML blocks* in `CommonMark`](https://spec.commonmark.org/0.31/#html-blocks)
93//!
94//! [flow]: crate::construct::flow
95//! [html_text]: crate::construct::html_text
96//! [paragraph]: crate::construct::paragraph
97//! [html_raw_names]: crate::util::constant::HTML_RAW_NAMES
98//! [html_block_names]: crate::util::constant::HTML_BLOCK_NAMES
99//! [html_parsing]: https://html.spec.whatwg.org/multipage/parsing.html#parsing
100
101use crate::construct::partial_space_or_tab::{
102 space_or_tab_with_options, Options as SpaceOrTabOptions,
103};
104use crate::event::Name;
105use crate::state::{Name as StateName, State};
106use crate::tokenizer::Tokenizer;
107use crate::util::{
108 constant::{HTML_BLOCK_NAMES, HTML_CDATA_PREFIX, HTML_RAW_NAMES, HTML_RAW_SIZE_MAX, TAB_SIZE},
109 slice::Slice,
110};
111
112/// Symbol for `<script>` (condition 1).
113const RAW: u8 = 1;
114/// Symbol for `<!---->` (condition 2).
115const COMMENT: u8 = 2;
116/// Symbol for `<?php?>` (condition 3).
117const INSTRUCTION: u8 = 3;
118/// Symbol for `<!doctype>` (condition 4).
119const DECLARATION: u8 = 4;
120/// Symbol for `<![CDATA[]]>` (condition 5).
121const CDATA: u8 = 5;
122/// Symbol for `<div` (condition 6).
123const BASIC: u8 = 6;
124/// Symbol for `<x>` (condition 7).
125const COMPLETE: u8 = 7;
126
127/// Start of HTML (flow).
128///
129/// ```markdown
130/// > | <x />
131/// ^
132/// ```
133pub fn start(tokenizer: &mut Tokenizer) -> State {
134 if tokenizer.parse_state.options.constructs.html_flow {
135 tokenizer.enter(Name::HtmlFlow);
136
137 if matches!(tokenizer.current, Some(b'\t' | b' ')) {
138 tokenizer.attempt(State::Next(StateName::HtmlFlowBefore), State::Nok);
139 State::Retry(space_or_tab_with_options(
140 tokenizer,
141 SpaceOrTabOptions {
142 kind: Name::HtmlFlowData,
143 min: 0,
144 max: if tokenizer.parse_state.options.constructs.code_indented {
145 TAB_SIZE - 1
146 } else {
147 usize::MAX
148 },
149 connect: false,
150 content: None,
151 },
152 ))
153 } else {
154 State::Retry(StateName::HtmlFlowBefore)
155 }
156 } else {
157 State::Nok
158 }
159}
160
161/// At `<`, after optional whitespace.
162///
163/// ```markdown
164/// > | <x />
165/// ^
166/// ```
167pub fn before(tokenizer: &mut Tokenizer) -> State {
168 if Some(b'<') == tokenizer.current {
169 tokenizer.enter(Name::HtmlFlowData);
170 tokenizer.consume();
171 State::Next(StateName::HtmlFlowOpen)
172 } else {
173 State::Nok
174 }
175}
176
177/// After `<`, at tag name or other stuff.
178///
179/// ```markdown
180/// > | <x />
181/// ^
182/// > | <!doctype>
183/// ^
184/// > | <!--xxx-->
185/// ^
186/// ```
187pub fn open(tokenizer: &mut Tokenizer) -> State {
188 match tokenizer.current {
189 Some(b'!') => {
190 tokenizer.consume();
191 State::Next(StateName::HtmlFlowDeclarationOpen)
192 }
193 Some(b'/') => {
194 tokenizer.consume();
195 tokenizer.tokenize_state.seen = true;
196 tokenizer.tokenize_state.start = tokenizer.point.index;
197 State::Next(StateName::HtmlFlowTagCloseStart)
198 }
199 Some(b'?') => {
200 tokenizer.consume();
201 tokenizer.tokenize_state.marker = INSTRUCTION;
202 // Do not form containers.
203 tokenizer.concrete = true;
204 // While we’re in an instruction instead of a declaration, we’re on a `?`
205 // right now, so we do need to search for `>`, similar to declarations.
206 State::Next(StateName::HtmlFlowContinuationDeclarationInside)
207 }
208 // ASCII alphabetical.
209 Some(b'A'..=b'Z' | b'a'..=b'z') => {
210 tokenizer.tokenize_state.start = tokenizer.point.index;
211 State::Retry(StateName::HtmlFlowTagName)
212 }
213 _ => State::Nok,
214 }
215}
216
217/// After `<!`, at declaration, comment, or CDATA.
218///
219/// ```markdown
220/// > | <!doctype>
221/// ^
222/// > | <!--xxx-->
223/// ^
224/// > | <![CDATA[>&<]]>
225/// ^
226/// ```
227pub fn declaration_open(tokenizer: &mut Tokenizer) -> State {
228 match tokenizer.current {
229 Some(b'-') => {
230 tokenizer.consume();
231 tokenizer.tokenize_state.marker = COMMENT;
232 State::Next(StateName::HtmlFlowCommentOpenInside)
233 }
234 Some(b'A'..=b'Z' | b'a'..=b'z') => {
235 tokenizer.consume();
236 tokenizer.tokenize_state.marker = DECLARATION;
237 // Do not form containers.
238 tokenizer.concrete = true;
239 State::Next(StateName::HtmlFlowContinuationDeclarationInside)
240 }
241 Some(b'[') => {
242 tokenizer.consume();
243 tokenizer.tokenize_state.marker = CDATA;
244 State::Next(StateName::HtmlFlowCdataOpenInside)
245 }
246 _ => State::Nok,
247 }
248}
249
250/// After `<!-`, inside a comment, at another `-`.
251///
252/// ```markdown
253/// > | <!--xxx-->
254/// ^
255/// ```
256pub fn comment_open_inside(tokenizer: &mut Tokenizer) -> State {
257 if let Some(b'-') = tokenizer.current {
258 tokenizer.consume();
259 // Do not form containers.
260 tokenizer.concrete = true;
261 State::Next(StateName::HtmlFlowContinuationDeclarationInside)
262 } else {
263 tokenizer.tokenize_state.marker = 0;
264 State::Nok
265 }
266}
267
268/// After `<![`, inside CDATA, expecting `CDATA[`.
269///
270/// ```markdown
271/// > | <![CDATA[>&<]]>
272/// ^^^^^^
273/// ```
274pub fn cdata_open_inside(tokenizer: &mut Tokenizer) -> State {
275 if tokenizer.current == Some(HTML_CDATA_PREFIX[tokenizer.tokenize_state.size]) {
276 tokenizer.consume();
277 tokenizer.tokenize_state.size += 1;
278
279 if tokenizer.tokenize_state.size == HTML_CDATA_PREFIX.len() {
280 tokenizer.tokenize_state.size = 0;
281 // Do not form containers.
282 tokenizer.concrete = true;
283 State::Next(StateName::HtmlFlowContinuation)
284 } else {
285 State::Next(StateName::HtmlFlowCdataOpenInside)
286 }
287 } else {
288 tokenizer.tokenize_state.marker = 0;
289 tokenizer.tokenize_state.size = 0;
290 State::Nok
291 }
292}
293
294/// After `</`, in closing tag, at tag name.
295///
296/// ```markdown
297/// > | </x>
298/// ^
299/// ```
300pub fn tag_close_start(tokenizer: &mut Tokenizer) -> State {
301 if let Some(b'A'..=b'Z' | b'a'..=b'z') = tokenizer.current {
302 tokenizer.consume();
303 State::Next(StateName::HtmlFlowTagName)
304 } else {
305 tokenizer.tokenize_state.seen = false;
306 tokenizer.tokenize_state.start = 0;
307 State::Nok
308 }
309}
310
311/// In tag name.
312///
313/// ```markdown
314/// > | <ab>
315/// ^^
316/// > | </ab>
317/// ^^
318/// ```
319pub fn tag_name(tokenizer: &mut Tokenizer) -> State {
320 match tokenizer.current {
321 None | Some(b'\t' | b'\n' | b' ' | b'/' | b'>') => {
322 let closing_tag = tokenizer.tokenize_state.seen;
323 let slash = matches!(tokenizer.current, Some(b'/'));
324 // Guaranteed to be valid ASCII bytes.
325 let slice = Slice::from_indices(
326 tokenizer.parse_state.bytes,
327 tokenizer.tokenize_state.start,
328 tokenizer.point.index,
329 );
330 let name = slice
331 .as_str()
332 // The line ending case might result in a `\r` that is already accounted for.
333 .trim()
334 .to_ascii_lowercase();
335 tokenizer.tokenize_state.seen = false;
336 tokenizer.tokenize_state.start = 0;
337
338 if !slash && !closing_tag && HTML_RAW_NAMES.contains(&name.as_str()) {
339 tokenizer.tokenize_state.marker = RAW;
340 // Do not form containers.
341 tokenizer.concrete = true;
342 State::Retry(StateName::HtmlFlowContinuation)
343 } else if HTML_BLOCK_NAMES.contains(&name.as_str()) {
344 tokenizer.tokenize_state.marker = BASIC;
345
346 if slash {
347 tokenizer.consume();
348 State::Next(StateName::HtmlFlowBasicSelfClosing)
349 } else {
350 // Do not form containers.
351 tokenizer.concrete = true;
352 State::Retry(StateName::HtmlFlowContinuation)
353 }
354 } else {
355 tokenizer.tokenize_state.marker = COMPLETE;
356
357 // Do not support complete HTML when interrupting.
358 if tokenizer.interrupt && !tokenizer.lazy {
359 tokenizer.tokenize_state.marker = 0;
360 State::Nok
361 } else if closing_tag {
362 State::Retry(StateName::HtmlFlowCompleteClosingTagAfter)
363 } else {
364 State::Retry(StateName::HtmlFlowCompleteAttributeNameBefore)
365 }
366 }
367 }
368 // ASCII alphanumerical and `-`.
369 Some(b'-' | b'0'..=b'9' | b'A'..=b'Z' | b'a'..=b'z') => {
370 tokenizer.consume();
371 State::Next(StateName::HtmlFlowTagName)
372 }
373 Some(_) => {
374 tokenizer.tokenize_state.seen = false;
375 State::Nok
376 }
377 }
378}
379
380/// After closing slash of a basic tag name.
381///
382/// ```markdown
383/// > | <div/>
384/// ^
385/// ```
386pub fn basic_self_closing(tokenizer: &mut Tokenizer) -> State {
387 if let Some(b'>') = tokenizer.current {
388 tokenizer.consume();
389 // Do not form containers.
390 tokenizer.concrete = true;
391 State::Next(StateName::HtmlFlowContinuation)
392 } else {
393 tokenizer.tokenize_state.marker = 0;
394 State::Nok
395 }
396}
397
398/// After closing slash of a complete tag name.
399///
400/// ```markdown
401/// > | <x/>
402/// ^
403/// ```
404pub fn complete_closing_tag_after(tokenizer: &mut Tokenizer) -> State {
405 match tokenizer.current {
406 Some(b'\t' | b' ') => {
407 tokenizer.consume();
408 State::Next(StateName::HtmlFlowCompleteClosingTagAfter)
409 }
410 _ => State::Retry(StateName::HtmlFlowCompleteEnd),
411 }
412}
413
414/// At an attribute name.
415///
416/// At first, this state is used after a complete tag name, after whitespace,
417/// where it expects optional attributes or the end of the tag.
418/// It is also reused after attributes, when expecting more optional
419/// attributes.
420///
421/// ```markdown
422/// > | <a />
423/// ^
424/// > | <a :b>
425/// ^
426/// > | <a _b>
427/// ^
428/// > | <a b>
429/// ^
430/// > | <a >
431/// ^
432/// ```
433pub fn complete_attribute_name_before(tokenizer: &mut Tokenizer) -> State {
434 match tokenizer.current {
435 Some(b'\t' | b' ') => {
436 tokenizer.consume();
437 State::Next(StateName::HtmlFlowCompleteAttributeNameBefore)
438 }
439 Some(b'/') => {
440 tokenizer.consume();
441 State::Next(StateName::HtmlFlowCompleteEnd)
442 }
443 // ASCII alphanumerical and `:` and `_`.
444 Some(b'0'..=b'9' | b':' | b'A'..=b'Z' | b'_' | b'a'..=b'z') => {
445 tokenizer.consume();
446 State::Next(StateName::HtmlFlowCompleteAttributeName)
447 }
448 _ => State::Retry(StateName::HtmlFlowCompleteEnd),
449 }
450}
451
452/// In attribute name.
453///
454/// ```markdown
455/// > | <a :b>
456/// ^
457/// > | <a _b>
458/// ^
459/// > | <a b>
460/// ^
461/// ```
462pub fn complete_attribute_name(tokenizer: &mut Tokenizer) -> State {
463 match tokenizer.current {
464 // ASCII alphanumerical and `-`, `.`, `:`, and `_`.
465 Some(b'-' | b'.' | b'0'..=b'9' | b':' | b'A'..=b'Z' | b'_' | b'a'..=b'z') => {
466 tokenizer.consume();
467 State::Next(StateName::HtmlFlowCompleteAttributeName)
468 }
469 _ => State::Retry(StateName::HtmlFlowCompleteAttributeNameAfter),
470 }
471}
472
473/// After attribute name, at an optional initializer, the end of the tag, or
474/// whitespace.
475///
476/// ```markdown
477/// > | <a b>
478/// ^
479/// > | <a b=c>
480/// ^
481/// ```
482pub fn complete_attribute_name_after(tokenizer: &mut Tokenizer) -> State {
483 match tokenizer.current {
484 Some(b'\t' | b' ') => {
485 tokenizer.consume();
486 State::Next(StateName::HtmlFlowCompleteAttributeNameAfter)
487 }
488 Some(b'=') => {
489 tokenizer.consume();
490 State::Next(StateName::HtmlFlowCompleteAttributeValueBefore)
491 }
492 _ => State::Retry(StateName::HtmlFlowCompleteAttributeNameBefore),
493 }
494}
495
496/// Before unquoted, double quoted, or single quoted attribute value, allowing
497/// whitespace.
498///
499/// ```markdown
500/// > | <a b=c>
501/// ^
502/// > | <a b="c">
503/// ^
504/// ```
505pub fn complete_attribute_value_before(tokenizer: &mut Tokenizer) -> State {
506 match tokenizer.current {
507 None | Some(b'<' | b'=' | b'>' | b'`') => {
508 tokenizer.tokenize_state.marker = 0;
509 State::Nok
510 }
511 Some(b'\t' | b' ') => {
512 tokenizer.consume();
513 State::Next(StateName::HtmlFlowCompleteAttributeValueBefore)
514 }
515 Some(b'"' | b'\'') => {
516 tokenizer.tokenize_state.marker_b = tokenizer.current.unwrap();
517 tokenizer.consume();
518 State::Next(StateName::HtmlFlowCompleteAttributeValueQuoted)
519 }
520 _ => State::Retry(StateName::HtmlFlowCompleteAttributeValueUnquoted),
521 }
522}
523
524/// In double or single quoted attribute value.
525///
526/// ```markdown
527/// > | <a b="c">
528/// ^
529/// > | <a b='c'>
530/// ^
531/// ```
532pub fn complete_attribute_value_quoted(tokenizer: &mut Tokenizer) -> State {
533 if tokenizer.current == Some(tokenizer.tokenize_state.marker_b) {
534 tokenizer.consume();
535 tokenizer.tokenize_state.marker_b = 0;
536 State::Next(StateName::HtmlFlowCompleteAttributeValueQuotedAfter)
537 } else if matches!(tokenizer.current, None | Some(b'\n')) {
538 tokenizer.tokenize_state.marker = 0;
539 tokenizer.tokenize_state.marker_b = 0;
540 State::Nok
541 } else {
542 tokenizer.consume();
543 State::Next(StateName::HtmlFlowCompleteAttributeValueQuoted)
544 }
545}
546
547/// In unquoted attribute value.
548///
549/// ```markdown
550/// > | <a b=c>
551/// ^
552/// ```
553pub fn complete_attribute_value_unquoted(tokenizer: &mut Tokenizer) -> State {
554 match tokenizer.current {
555 None | Some(b'\t' | b'\n' | b' ' | b'"' | b'\'' | b'/' | b'<' | b'=' | b'>' | b'`') => {
556 State::Retry(StateName::HtmlFlowCompleteAttributeNameAfter)
557 }
558 Some(_) => {
559 tokenizer.consume();
560 State::Next(StateName::HtmlFlowCompleteAttributeValueUnquoted)
561 }
562 }
563}
564
565/// After double or single quoted attribute value, before whitespace or the
566/// end of the tag.
567///
568/// ```markdown
569/// > | <a b="c">
570/// ^
571/// ```
572pub fn complete_attribute_value_quoted_after(tokenizer: &mut Tokenizer) -> State {
573 if let Some(b'\t' | b' ' | b'/' | b'>') = tokenizer.current {
574 State::Retry(StateName::HtmlFlowCompleteAttributeNameBefore)
575 } else {
576 tokenizer.tokenize_state.marker = 0;
577 State::Nok
578 }
579}
580
581/// In certain circumstances of a complete tag where only an `>` is allowed.
582///
583/// ```markdown
584/// > | <a b="c">
585/// ^
586/// ```
587pub fn complete_end(tokenizer: &mut Tokenizer) -> State {
588 if let Some(b'>') = tokenizer.current {
589 tokenizer.consume();
590 State::Next(StateName::HtmlFlowCompleteAfter)
591 } else {
592 tokenizer.tokenize_state.marker = 0;
593 State::Nok
594 }
595}
596
597/// After `>` in a complete tag.
598///
599/// ```markdown
600/// > | <x>
601/// ^
602/// ```
603pub fn complete_after(tokenizer: &mut Tokenizer) -> State {
604 match tokenizer.current {
605 None | Some(b'\n') => {
606 // Do not form containers.
607 tokenizer.concrete = true;
608 State::Retry(StateName::HtmlFlowContinuation)
609 }
610 Some(b'\t' | b' ') => {
611 tokenizer.consume();
612 State::Next(StateName::HtmlFlowCompleteAfter)
613 }
614 Some(_) => {
615 tokenizer.tokenize_state.marker = 0;
616 State::Nok
617 }
618 }
619}
620
621/// In continuation of any HTML kind.
622///
623/// ```markdown
624/// > | <!--xxx-->
625/// ^
626/// ```
627pub fn continuation(tokenizer: &mut Tokenizer) -> State {
628 if tokenizer.tokenize_state.marker == COMMENT && tokenizer.current == Some(b'-') {
629 tokenizer.consume();
630 State::Next(StateName::HtmlFlowContinuationCommentInside)
631 } else if tokenizer.tokenize_state.marker == RAW && tokenizer.current == Some(b'<') {
632 tokenizer.consume();
633 State::Next(StateName::HtmlFlowContinuationRawTagOpen)
634 } else if tokenizer.tokenize_state.marker == DECLARATION && tokenizer.current == Some(b'>') {
635 tokenizer.consume();
636 State::Next(StateName::HtmlFlowContinuationClose)
637 } else if tokenizer.tokenize_state.marker == INSTRUCTION && tokenizer.current == Some(b'?') {
638 tokenizer.consume();
639 State::Next(StateName::HtmlFlowContinuationDeclarationInside)
640 } else if tokenizer.tokenize_state.marker == CDATA && tokenizer.current == Some(b']') {
641 tokenizer.consume();
642 State::Next(StateName::HtmlFlowContinuationCdataInside)
643 } else if matches!(tokenizer.tokenize_state.marker, BASIC | COMPLETE)
644 && tokenizer.current == Some(b'\n')
645 {
646 tokenizer.exit(Name::HtmlFlowData);
647 tokenizer.check(
648 State::Next(StateName::HtmlFlowContinuationAfter),
649 State::Next(StateName::HtmlFlowContinuationStart),
650 );
651 State::Retry(StateName::HtmlFlowBlankLineBefore)
652 } else if matches!(tokenizer.current, None | Some(b'\n')) {
653 tokenizer.exit(Name::HtmlFlowData);
654 State::Retry(StateName::HtmlFlowContinuationStart)
655 } else {
656 tokenizer.consume();
657 State::Next(StateName::HtmlFlowContinuation)
658 }
659}
660
661/// In continuation, at eol.
662///
663/// ```markdown
664/// > | <x>
665/// ^
666/// | asd
667/// ```
668pub fn continuation_start(tokenizer: &mut Tokenizer) -> State {
669 tokenizer.check(
670 State::Next(StateName::HtmlFlowContinuationStartNonLazy),
671 State::Next(StateName::HtmlFlowContinuationAfter),
672 );
673 State::Retry(StateName::NonLazyContinuationStart)
674}
675
676/// In continuation, at eol, before non-lazy content.
677///
678/// ```markdown
679/// > | <x>
680/// ^
681/// | asd
682/// ```
683pub fn continuation_start_non_lazy(tokenizer: &mut Tokenizer) -> State {
684 match tokenizer.current {
685 Some(b'\n') => {
686 tokenizer.enter(Name::LineEnding);
687 tokenizer.consume();
688 tokenizer.exit(Name::LineEnding);
689 State::Next(StateName::HtmlFlowContinuationBefore)
690 }
691 _ => unreachable!("expected eol"),
692 }
693}
694
695/// In continuation, before non-lazy content.
696///
697/// ```markdown
698/// | <x>
699/// > | asd
700/// ^
701/// ```
702pub fn continuation_before(tokenizer: &mut Tokenizer) -> State {
703 match tokenizer.current {
704 None | Some(b'\n') => State::Retry(StateName::HtmlFlowContinuationStart),
705 _ => {
706 tokenizer.enter(Name::HtmlFlowData);
707 State::Retry(StateName::HtmlFlowContinuation)
708 }
709 }
710}
711
712/// In comment continuation, after one `-`, expecting another.
713///
714/// ```markdown
715/// > | <!--xxx-->
716/// ^
717/// ```
718pub fn continuation_comment_inside(tokenizer: &mut Tokenizer) -> State {
719 match tokenizer.current {
720 Some(b'-') => {
721 tokenizer.consume();
722 State::Next(StateName::HtmlFlowContinuationDeclarationInside)
723 }
724 _ => State::Retry(StateName::HtmlFlowContinuation),
725 }
726}
727
728/// In raw continuation, after `<`, at `/`.
729///
730/// ```markdown
731/// > | <script>console.log(1)</script>
732/// ^
733/// ```
734pub fn continuation_raw_tag_open(tokenizer: &mut Tokenizer) -> State {
735 match tokenizer.current {
736 Some(b'/') => {
737 tokenizer.consume();
738 tokenizer.tokenize_state.start = tokenizer.point.index;
739 State::Next(StateName::HtmlFlowContinuationRawEndTag)
740 }
741 _ => State::Retry(StateName::HtmlFlowContinuation),
742 }
743}
744
745/// In raw continuation, after `</`, in a raw tag name.
746///
747/// ```markdown
748/// > | <script>console.log(1)</script>
749/// ^^^^^^
750/// ```
751pub fn continuation_raw_end_tag(tokenizer: &mut Tokenizer) -> State {
752 match tokenizer.current {
753 Some(b'>') => {
754 // Guaranteed to be valid ASCII bytes.
755 let slice = Slice::from_indices(
756 tokenizer.parse_state.bytes,
757 tokenizer.tokenize_state.start,
758 tokenizer.point.index,
759 );
760 let name = slice.as_str().to_ascii_lowercase();
761
762 tokenizer.tokenize_state.start = 0;
763
764 if HTML_RAW_NAMES.contains(&name.as_str()) {
765 tokenizer.consume();
766 State::Next(StateName::HtmlFlowContinuationClose)
767 } else {
768 State::Retry(StateName::HtmlFlowContinuation)
769 }
770 }
771 Some(b'A'..=b'Z' | b'a'..=b'z')
772 if tokenizer.point.index - tokenizer.tokenize_state.start < HTML_RAW_SIZE_MAX =>
773 {
774 tokenizer.consume();
775 State::Next(StateName::HtmlFlowContinuationRawEndTag)
776 }
777 _ => {
778 tokenizer.tokenize_state.start = 0;
779 State::Retry(StateName::HtmlFlowContinuation)
780 }
781 }
782}
783
784/// In cdata continuation, after `]`, expecting `]>`.
785///
786/// ```markdown
787/// > | <![CDATA[>&<]]>
788/// ^
789/// ```
790pub fn continuation_cdata_inside(tokenizer: &mut Tokenizer) -> State {
791 match tokenizer.current {
792 Some(b']') => {
793 tokenizer.consume();
794 State::Next(StateName::HtmlFlowContinuationDeclarationInside)
795 }
796 _ => State::Retry(StateName::HtmlFlowContinuation),
797 }
798}
799
800/// In declaration or instruction continuation, at `>`.
801///
802/// ```markdown
803/// > | <!-->
804/// ^
805/// > | <?>
806/// ^
807/// > | <!q>
808/// ^
809/// > | <!--ab-->
810/// ^
811/// > | <![CDATA[>&<]]>
812/// ^
813/// ```
814pub fn continuation_declaration_inside(tokenizer: &mut Tokenizer) -> State {
815 if tokenizer.tokenize_state.marker == COMMENT && tokenizer.current == Some(b'-') {
816 tokenizer.consume();
817 State::Next(StateName::HtmlFlowContinuationDeclarationInside)
818 } else if tokenizer.current == Some(b'>') {
819 tokenizer.consume();
820 State::Next(StateName::HtmlFlowContinuationClose)
821 } else {
822 State::Retry(StateName::HtmlFlowContinuation)
823 }
824}
825
826/// In closed continuation: everything we get until the eol/eof is part of it.
827///
828/// ```markdown
829/// > | <!doctype>
830/// ^
831/// ```
832pub fn continuation_close(tokenizer: &mut Tokenizer) -> State {
833 match tokenizer.current {
834 None | Some(b'\n') => {
835 tokenizer.exit(Name::HtmlFlowData);
836 State::Retry(StateName::HtmlFlowContinuationAfter)
837 }
838 _ => {
839 tokenizer.consume();
840 State::Next(StateName::HtmlFlowContinuationClose)
841 }
842 }
843}
844
845/// Done.
846///
847/// ```markdown
848/// > | <!doctype>
849/// ^
850/// ```
851pub fn continuation_after(tokenizer: &mut Tokenizer) -> State {
852 tokenizer.exit(Name::HtmlFlow);
853 tokenizer.tokenize_state.marker = 0;
854 // Feel free to interrupt.
855 tokenizer.interrupt = false;
856 // No longer concrete.
857 tokenizer.concrete = false;
858 State::Ok
859}
860
861/// Before eol, expecting blank line.
862///
863/// ```markdown
864/// > | <div>
865/// ^
866/// |
867/// ```
868pub fn blank_line_before(tokenizer: &mut Tokenizer) -> State {
869 tokenizer.enter(Name::LineEnding);
870 tokenizer.consume();
871 tokenizer.exit(Name::LineEnding);
872 State::Next(StateName::BlankLineStart)
873}