Markdown parser fork with extended syntax for personal use.
1//! GFM: Footnote definition occurs in the [document][] content type.
2//!
3//! ## Grammar
4//!
5//! Footnote definitions form with the following BNF
6//! (<small>see [construct][crate::construct] for character groups</small>):
7//!
8//! ```bnf
9//! ; Restriction: `label` must start with `^` (and not be empty after it).
10//! ; See the `label` construct for the BNF of that part.
11//! gfm_footnote_definition_start ::= label ':' *space_or_tab
12//!
13//! ; Restriction: blank line allowed.
14//! gfm_footnote_definition_cont ::= 4(space_or_tab)
15//! ```
16//!
17//! Further lines that are not prefixed with `gfm_footnote_definition_cont`
18//! cause the footnote definition to be exited, except when those lines are
19//! lazy continuation or blank.
20//! Like so many things in markdown, footnote definition too are complex.
21//! See [*§ Phase 1: block structure* in `CommonMark`][commonmark_block] for
22//! more on parsing details.
23//!
24//! See [`label`][label] for grammar, notes, and recommendations on that part.
25//!
26//! The `label` part is interpreted as the [string][] content type.
27//! That means that [character escapes][character_escape] and
28//! [character references][character_reference] are allowed.
29//!
30//! Definitions match to calls through identifiers.
31//! To match, both labels must be equal after normalizing with
32//! [`normalize_identifier`][].
33//! One definition can match to multiple calls.
34//! Multiple definitions with the same, normalized, identifier are ignored: the
35//! first definition is preferred.
36//! To illustrate, the definition with the content of `x` wins:
37//!
38//! ```markdown
39//! [^a]: x
40//! [^a]: y
41//!
42//! [^a]
43//! ```
44//!
45//! Importantly, while labels *can* include [string][] content (character
46//! escapes and character references), these are not considered when matching.
47//! To illustrate, neither definition matches the call:
48//!
49//! ```markdown
50//! [^a&b]: x
51//! [^a\&b]: y
52//!
53//! [^a&b]
54//! ```
55//!
56//! Because footnote definitions are containers (like block quotes and list
57//! items), they can contain more footnote definitions, and they can include
58//! calls to themselves.
59//!
60//! ## HTML
61//!
62//! GFM footnote definitions do not, on their own, relate to anything in HTML.
63//! When matched with a [label end][label_end], which in turns matches to a
64//! [GFM label start (footnote)][gfm_label_start_footnote], the definition
65//! relates to several elements in HTML.
66//!
67//! When one or more definitions are called, a footnote section is generated
68//! at the end of the document, using `<section>`, `<h2>`, and `<ol>` elements:
69//!
70//! ```html
71//! <section data-footnotes="" class="footnotes"><h2 id="footnote-label" class="sr-only">Footnotes</h2>
72//! <ol>…</ol>
73//! </section>
74//! ```
75//!
76//! Each definition is generated as a `<li>` in the `<ol>`, in the order they
77//! were first called:
78//!
79//! ```html
80//! <li id="user-content-fn-1">…</li>
81//! ```
82//!
83//! Backreferences are injected at the end of the first paragraph, or, when
84//! there is no paragraph, at the end of the definition.
85//! When a definition is called multiple times, multiple backreferences are
86//! generated.
87//! Further backreferences use an extra counter in the `href` attribute and
88//! visually in a `<span>` after `↩`.
89//!
90//! ```html
91//! <a href="#user-content-fnref-1" data-footnote-backref="" class="data-footnote-backref" aria-label="Back to content">↩</a> <a href="#user-content-fnref-1-2" data-footnote-backref="" class="data-footnote-backref" aria-label="Back to content">↩<sup>2</sup></a>
92//! ```
93//!
94//! See
95//! [*§ 4.5.1 The `a` element*][html_a],
96//! [*§ 4.3.6 The `h1`, `h2`, `h3`, `h4`, `h5`, and `h6` elements*][html_h],
97//! [*§ 4.4.8 The `li` element*][html_li],
98//! [*§ 4.4.5 The `ol` element*][html_ol],
99//! [*§ 4.4.1 The `p` element*][html_p],
100//! [*§ 4.3.3 The `section` element*][html_section], and
101//! [*§ 4.5.19 The `sub` and `sup` elements*][html_sup]
102//! in the HTML spec for more info.
103//!
104//! ## Recommendation
105//!
106//! When authoring markdown with footnotes, it’s recommended to use words
107//! instead of numbers (or letters or anything with an order) as calls.
108//! That makes it easier to reuse and reorder footnotes.
109//!
110//! It’s recommended to place footnotes definitions at the bottom of the document.
111//!
112//! ## Bugs
113//!
114//! GitHub’s own algorithm to parse footnote definitions contains several bugs.
115//! These are not present in this project.
116//! The issues relating to footnote definitions are:
117//!
118//! * [Footnote reference call identifiers are trimmed, but definition identifiers aren’t](https://github.com/github/cmark-gfm/issues/237)\
119//! — initial and final whitespace in labels causes them not to match
120//! * [Footnotes are matched case-insensitive, but links keep their casing, breaking them](https://github.com/github/cmark-gfm/issues/239)\
121//! — using uppercase (or any character that will be percent encoded) in identifiers breaks links
122//! * [Colons in footnotes generate links w/o `href`](https://github.com/github/cmark-gfm/issues/250)\
123//! — colons in identifiers generate broken links
124//! * [Character escape of `]` does not work in footnote identifiers](https://github.com/github/cmark-gfm/issues/240)\
125//! — some character escapes don’t work
126//! * [Footnotes in links are broken](https://github.com/github/cmark-gfm/issues/249)\
127//! — while `CommonMark` prevents links in links, GitHub does not prevent footnotes (which turn into links) in links
128//! * [Footnote-like brackets around image, break that image](https://github.com/github/cmark-gfm/issues/275)\
129//! — images can’t be used in what looks like a footnote call
130//! * [GFM footnotes: line ending in footnote definition label causes text to disappear](https://github.com/github/cmark-gfm/issues/282)\
131//! — line endings in footnote definitions cause text to disappear
132//!
133//! ## Tokens
134//!
135//! * [`DefinitionMarker`][Name::DefinitionMarker]
136//! * [`GfmFootnoteDefinition`][Name::GfmFootnoteDefinition]
137//! * [`GfmFootnoteDefinitionLabel`][Name::GfmFootnoteDefinitionLabel]
138//! * [`GfmFootnoteDefinitionLabelMarker`][Name::GfmFootnoteDefinitionLabelMarker]
139//! * [`GfmFootnoteDefinitionLabelString`][Name::GfmFootnoteDefinitionLabelString]
140//! * [`GfmFootnoteDefinitionMarker`][Name::GfmFootnoteDefinitionMarker]
141//! * [`GfmFootnoteDefinitionPrefix`][Name::GfmFootnoteDefinitionPrefix]
142//! * [`SpaceOrTab`][Name::SpaceOrTab]
143//!
144//! ## References
145//!
146//! * [`micromark-extension-gfm-footnote`](https://github.com/micromark/micromark-extension-gfm-footnote)
147//!
148//! > 👉 **Note**: Footnotes are not specified in GFM yet.
149//! > See [`github/cmark-gfm#270`](https://github.com/github/cmark-gfm/issues/270)
150//! > for the related issue.
151//!
152//! [document]: crate::construct::document
153//! [string]: crate::construct::string
154//! [character_reference]: crate::construct::character_reference
155//! [character_escape]: crate::construct::character_escape
156//! [label]: crate::construct::partial_label
157//! [label_end]: crate::construct::label_end
158//! [gfm_label_start_footnote]: crate::construct::gfm_label_start_footnote
159//! [commonmark_block]: https://spec.commonmark.org/0.31/#phase-1-block-structure
160//! [html_a]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element
161//! [html_h]: https://html.spec.whatwg.org/multipage/sections.html#the-h1,-h2,-h3,-h4,-h5,-and-h6-elements
162//! [html_li]: https://html.spec.whatwg.org/multipage/grouping-content.html#the-li-element
163//! [html_ol]: https://html.spec.whatwg.org/multipage/grouping-content.html#the-ol-element
164//! [html_p]: https://html.spec.whatwg.org/multipage/grouping-content.html#the-p-element
165//! [html_section]: https://html.spec.whatwg.org/multipage/sections.html#the-section-element
166//! [html_sup]: https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-sub-and-sup-elements
167
168use crate::construct::partial_space_or_tab::space_or_tab_min_max;
169use crate::event::{Content, Link, Name};
170use crate::state::{Name as StateName, State};
171use crate::tokenizer::Tokenizer;
172use crate::util::{
173 constant::{LINK_REFERENCE_SIZE_MAX, TAB_SIZE},
174 normalize_identifier::normalize_identifier,
175 skip,
176 slice::{Position, Slice},
177};
178
179/// Start of GFM footnote definition.
180///
181/// ```markdown
182/// > | [^a]: b
183/// ^
184/// ```
185pub fn start(tokenizer: &mut Tokenizer) -> State {
186 if tokenizer
187 .parse_state
188 .options
189 .constructs
190 .gfm_footnote_definition
191 {
192 tokenizer.enter(Name::GfmFootnoteDefinition);
193
194 if matches!(tokenizer.current, Some(b'\t' | b' ')) {
195 tokenizer.attempt(
196 State::Next(StateName::GfmFootnoteDefinitionLabelBefore),
197 State::Nok,
198 );
199 State::Retry(space_or_tab_min_max(
200 tokenizer,
201 1,
202 if tokenizer.parse_state.options.constructs.code_indented {
203 TAB_SIZE - 1
204 } else {
205 usize::MAX
206 },
207 ))
208 } else {
209 State::Retry(StateName::GfmFootnoteDefinitionLabelBefore)
210 }
211 } else {
212 State::Nok
213 }
214}
215
216/// Before definition label (after optional whitespace).
217///
218/// ```markdown
219/// > | [^a]: b
220/// ^
221/// ```
222pub fn label_before(tokenizer: &mut Tokenizer) -> State {
223 match tokenizer.current {
224 Some(b'[') => {
225 tokenizer.enter(Name::GfmFootnoteDefinitionPrefix);
226 tokenizer.enter(Name::GfmFootnoteDefinitionLabel);
227 tokenizer.enter(Name::GfmFootnoteDefinitionLabelMarker);
228 tokenizer.consume();
229 tokenizer.exit(Name::GfmFootnoteDefinitionLabelMarker);
230 State::Next(StateName::GfmFootnoteDefinitionLabelAtMarker)
231 }
232 _ => State::Nok,
233 }
234}
235
236/// In label, at caret.
237///
238/// ```markdown
239/// > | [^a]: b
240/// ^
241/// ```
242pub fn label_at_marker(tokenizer: &mut Tokenizer) -> State {
243 if tokenizer.current == Some(b'^') {
244 tokenizer.enter(Name::GfmFootnoteDefinitionMarker);
245 tokenizer.consume();
246 tokenizer.exit(Name::GfmFootnoteDefinitionMarker);
247 tokenizer.enter(Name::GfmFootnoteDefinitionLabelString);
248 tokenizer.enter_link(
249 Name::Data,
250 Link {
251 previous: None,
252 next: None,
253 content: Content::String,
254 },
255 );
256 State::Next(StateName::GfmFootnoteDefinitionLabelInside)
257 } else {
258 State::Nok
259 }
260}
261
262/// In label.
263///
264/// > 👉 **Note**: `cmark-gfm` prevents whitespace from occurring in footnote
265/// > definition labels.
266///
267/// ```markdown
268/// > | [^a]: b
269/// ^
270/// ```
271pub fn label_inside(tokenizer: &mut Tokenizer) -> State {
272 // Too long.
273 if tokenizer.tokenize_state.size > LINK_REFERENCE_SIZE_MAX
274 // Space or tab is not supported by GFM for some reason (`\n` and
275 // `[` make sense).
276 || matches!(tokenizer.current, None | Some(b'\t' | b'\n' | b' ' | b'['))
277 // Closing brace with nothing.
278 || (matches!(tokenizer.current, Some(b']')) && tokenizer.tokenize_state.size == 0)
279 {
280 tokenizer.tokenize_state.size = 0;
281 State::Nok
282 } else if matches!(tokenizer.current, Some(b']')) {
283 tokenizer.tokenize_state.size = 0;
284 tokenizer.exit(Name::Data);
285 tokenizer.exit(Name::GfmFootnoteDefinitionLabelString);
286 tokenizer.enter(Name::GfmFootnoteDefinitionLabelMarker);
287 tokenizer.consume();
288 tokenizer.exit(Name::GfmFootnoteDefinitionLabelMarker);
289 tokenizer.exit(Name::GfmFootnoteDefinitionLabel);
290 State::Next(StateName::GfmFootnoteDefinitionLabelAfter)
291 } else {
292 let next = if matches!(tokenizer.current.unwrap(), b'\\') {
293 StateName::GfmFootnoteDefinitionLabelEscape
294 } else {
295 StateName::GfmFootnoteDefinitionLabelInside
296 };
297 tokenizer.consume();
298 tokenizer.tokenize_state.size += 1;
299 State::Next(next)
300 }
301}
302
303/// After `\`, at a special character.
304///
305/// > 👉 **Note**: `cmark-gfm` currently does not support escaped brackets:
306/// > <https://github.com/github/cmark-gfm/issues/240>
307///
308/// ```markdown
309/// > | [^a\*b]: c
310/// ^
311/// ```
312pub fn label_escape(tokenizer: &mut Tokenizer) -> State {
313 match tokenizer.current {
314 Some(b'[' | b'\\' | b']') => {
315 tokenizer.tokenize_state.size += 1;
316 tokenizer.consume();
317 State::Next(StateName::GfmFootnoteDefinitionLabelInside)
318 }
319 _ => State::Retry(StateName::GfmFootnoteDefinitionLabelInside),
320 }
321}
322
323/// After definition label.
324///
325/// ```markdown
326/// > | [^a]: b
327/// ^
328/// ```
329pub fn label_after(tokenizer: &mut Tokenizer) -> State {
330 match tokenizer.current {
331 Some(b':') => {
332 let end = skip::to_back(
333 &tokenizer.events,
334 tokenizer.events.len() - 1,
335 &[Name::GfmFootnoteDefinitionLabelString],
336 );
337
338 // Note: we don’t care about virtual spaces, so `as_str` is fine.
339 let id = normalize_identifier(
340 Slice::from_position(
341 tokenizer.parse_state.bytes,
342 &Position::from_exit_event(&tokenizer.events, end),
343 )
344 .as_str(),
345 );
346
347 // Note: we don’t care about uniqueness.
348 // It’s likely that that doesn’t happen very frequently.
349 // It is more likely that it wastes precious time.
350 tokenizer.tokenize_state.gfm_footnote_definitions.push(id);
351
352 tokenizer.enter(Name::DefinitionMarker);
353 tokenizer.consume();
354 tokenizer.exit(Name::DefinitionMarker);
355 tokenizer.attempt(
356 State::Next(StateName::GfmFootnoteDefinitionWhitespaceAfter),
357 State::Nok,
358 );
359 // Any whitespace after the marker is eaten, forming indented code
360 // is not possible.
361 // No space is also fine, just like a block quote marker.
362 State::Next(space_or_tab_min_max(tokenizer, 0, usize::MAX))
363 }
364 _ => State::Nok,
365 }
366}
367
368/// After definition prefix.
369///
370/// ```markdown
371/// > | [^a]: b
372/// ^
373/// ```
374pub fn whitespace_after(tokenizer: &mut Tokenizer) -> State {
375 tokenizer.exit(Name::GfmFootnoteDefinitionPrefix);
376 State::Ok
377}
378
379/// Start of footnote definition continuation.
380///
381/// ```markdown
382/// | [^a]: b
383/// > | c
384/// ^
385/// ```
386pub fn cont_start(tokenizer: &mut Tokenizer) -> State {
387 tokenizer.check(
388 State::Next(StateName::GfmFootnoteDefinitionContBlank),
389 State::Next(StateName::GfmFootnoteDefinitionContFilled),
390 );
391 State::Retry(StateName::BlankLineStart)
392}
393
394/// Start of footnote definition continuation, at a blank line.
395///
396/// ```markdown
397/// | [^a]: b
398/// > | ␠␠␊
399/// ^
400/// ```
401pub fn cont_blank(tokenizer: &mut Tokenizer) -> State {
402 if matches!(tokenizer.current, Some(b'\t' | b' ')) {
403 State::Retry(space_or_tab_min_max(tokenizer, 0, TAB_SIZE))
404 } else {
405 State::Ok
406 }
407}
408
409/// Start of footnote definition continuation, at a filled line.
410///
411/// ```markdown
412/// | [^a]: b
413/// > | c
414/// ^
415/// ```
416pub fn cont_filled(tokenizer: &mut Tokenizer) -> State {
417 if matches!(tokenizer.current, Some(b'\t' | b' ')) {
418 // Consume exactly `TAB_SIZE`.
419 State::Retry(space_or_tab_min_max(tokenizer, TAB_SIZE, TAB_SIZE))
420 } else {
421 State::Nok
422 }
423}