Markdown parser fork with extended syntax for personal use.
1<p align="center">
2 <br>
3 <img width="192" src="media/logo-chromatic.svg" alt="">
4 <br>
5 <br>
6 <br>
7</p>
8
9# markdown-rs
10
11CommonMark compliant markdown parser in Rust with ASTs and extensions.
12
13## Feature highlights
14
15* [x] **[compliant][commonmark]**
16 (100% to CommonMark)
17* [x] **[extensions][]**
18 (100% GFM, 100% MDX, frontmatter, math)
19* [x] **[safe][security]**
20 (100% safe Rust, also 100% safe HTML by default)
21* [x] **[robust][test]**
22 (2300+ tests, 100% coverage, fuzz testing)
23* [x] **[ast][mdast]**
24 (mdast)
25
26## Links
27
28* [GitHub: `wooorm/markdown-rs`][repo]
29* [`crates.io`: `markdown`][crate]
30* [`docs.rs`: `markdown`][docs]
31
32## When should I use this?
33
34* if you *just* want to turn markdown into HTML (with maybe a few extensions)
35* if you want to do *really complex things* with markdown
36
37## What is this?
38
39`markdown-rs` is an open source markdown parser written in Rust.
40It’s implemented as a state machine (`#![no_std]` + `alloc`) that emits
41concrete tokens,
42so that every byte is accounted for,
43with positional info.
44The API then exposes this information as an AST,
45which is easier to work with,
46or it compiles directly to HTML.
47
48While most markdown parsers work towards compliancy with CommonMark (or GFM),
49this project goes further by following how the reference parsers (`cmark`,
50`cmark-gfm`) work,
51which is confirmed with thousands of extra tests.
52
53Other than CommonMark and GFM,
54this project also supports common extensions to markdown such as
55MDX, math, and frontmatter.
56
57This Rust crate has a sibling project in JavaScript:
58[`micromark`][micromark]
59(and [`mdast-util-from-markdown`][mdast-util-from-markdown] for the AST).
60
61P.S. if you want to *compile* MDX,
62use [`mdxjs-rs`][mdxjs-rs].
63
64## Questions
65
66* to learn markdown,
67 see this [cheatsheet and tutorial][cheat]
68* for the API,
69 see the [crate docs][docs]
70* for questions,
71 see [Discussions][]
72* to help,
73 see [contribute][] or [sponsor][] below
74
75## Contents
76
77* [Install](#install)
78* [Use](#use)
79* [API](#api)
80* [Extensions](#extensions)
81* [Project](#project)
82 * [Overview](#overview)
83 * [File structure](#file-structure)
84 * [Test](#test)
85 * [Version](#version)
86 * [Security](#security)
87 * [Contribute](#contribute)
88 * [Sponsor](#sponsor)
89 * [Thanks](#thanks)
90* [Related](#related)
91* [License](#license)
92
93## Install
94
95With [Rust][]
96(rust edition 2018+, ±version 1.56+),
97install with `cargo`:
98
99```sh
100cargo add markdown
101```
102
103## Use
104
105```rs
106fn main() {
107 println!("{}", markdown::to_html("## Hi, *Saturn*! 🪐"));
108}
109```
110
111Yields:
112
113```html
114<h2>Hi, <em>Saturn</em>! 🪐</h2>
115```
116
117Extensions (in this case GFM):
118
119```rs
120fn main() -> Result<(), markdown::message::Message> {
121 println!(
122 "{}",
123 markdown::to_html_with_options(
124 "* [x] contact ~Mercury~Venus at hi@venus.com!",
125 &markdown::Options::gfm()
126 )?
127 );
128
129 Ok(())
130}
131```
132
133Yields:
134
135```html
136<ul>
137 <li>
138 <input checked="" disabled="" type="checkbox" />
139 contact <del>Mercury</del>Venus at <a href="mailto:hi@venus.com">hi@venus.com</a>!
140 </li>
141</ul>
142```
143
144Syntax tree ([mdast][]):
145
146```rs
147fn main() -> Result<(), markdown::message::Message> {
148 println!(
149 "{:?}",
150 markdown::to_mdast("# Hi *Earth*!", &markdown::ParseOptions::default())?
151 );
152
153 Ok(())
154}
155```
156
157Yields:
158
159```text
160Root { children: [Heading { children: [Text { value: "Hi ", position: Some(1:3-1:6 (2-5)) }, Emphasis { children: [Text { value: "Earth", position: Some(1:7-1:12 (6-11)) }], position: Some(1:6-1:13 (5-12)) }, Text { value: "!", position: Some(1:13-1:14 (12-13)) }], position: Some(1:1-1:14 (0-13)), depth: 1 }], position: Some(1:1-1:14 (0-13)) }
161```
162
163## API
164
165`markdown-rs` exposes
166[`to_html`](https://docs.rs/markdown/latest/markdown/fn.to_html.html),
167[`to_html_with_options`](https://docs.rs/markdown/latest/markdown/fn.to_html_with_options.html),
168[`to_mdast`](https://docs.rs/markdown/latest/markdown/fn.to_mdast.html),
169[`Options`](https://docs.rs/markdown/latest/markdown/struct.Options.html),
170and a few other structs and enums.
171
172See the [crate docs][docs] for more info.
173
174## Extensions
175
176`markdown-rs` supports extensions to `CommonMark`.
177These extensions are maintained in this project.
178They are not enabled by default but can be turned on with options.
179
180* GFM
181 * autolink literal
182 * footnote
183 * strikethrough
184 * table
185 * tagfilter
186 * task list item
187* MDX
188 * ESM
189 * expressions
190 * JSX
191* frontmatter
192* math
193
194It is not a goal of this project to support lots of different extensions.
195It’s instead a goal to support very common and mostly standardized extensions.
196
197## Project
198
199`markdown-rs` is maintained as a single monolithic crate.
200
201### Overview
202
203The process to parse markdown looks like this:
204
205```txt
206 markdown-rs
207+-------------------------------------------------+
208| +-------+ +---------+--html- |
209| -markdown->+ parse +-events->+ compile + |
210| +-------+ +---------+-mdast- |
211+-------------------------------------------------+
212```
213
214### File structure
215
216The files in `src/` are as follows:
217
218* `construct/*.rs`
219 — CommonMark, GFM, and other extension constructs used in markdown
220* `util/*.rs`
221 — helpers often needed when parsing markdown
222* `event.rs`
223 — things with meaning happening somewhere
224* `lib.rs`
225 — public API
226* `mdast.rs`
227 — syntax tree
228* `parser.rs`
229 — turn a string of markdown into events
230* `resolve.rs`
231 — steps to process events
232* `state.rs`
233 — steps of the state machine
234* `subtokenize.rs`
235 — handle content in other content
236* `to_html.rs`
237 — turns events into a string of HTML
238* `to_mdast.rs`
239 — turns events into a syntax tree
240* `tokenizer.rs`
241 — glue the states of the state machine together
242* `unist.rs`
243 — point and position, used in mdast
244
245### Test
246
247`markdown-rs` is tested with the \~650 CommonMark tests and more than 1k extra
248tests confirmed with CM reference parsers.
249Then there’s even more tests for GFM and other extensions.
250These tests reach all branches in the code,
251which means that this project has 100% code coverage.
252Fuzz testing is used to check for things that might fall through coverage.
253
254The following bash scripts are useful when working on this project:
255
256* generate code (latest CM tests and Unicode info):
257 ```sh
258 cargo run --manifest-path generate/Cargo.toml
259 ```
260* run examples:
261 ```sh
262 RUST_BACKTRACE=1 RUST_LOG=trace cargo run --example lib --features log
263 ```
264* format:
265 ```sh
266 cargo fmt && cargo fix --all-features --all-targets --workspace
267 ```
268* lint:
269 ```sh
270 cargo fmt --check && cargo clippy --all-features --all-targets --workspace
271 ```
272* test:
273 ```sh
274 RUST_BACKTRACE=1 cargo test --all-features --workspace
275 ```
276* docs:
277 ```sh
278 cargo doc --document-private-items --examples --workspace
279 ```
280* fuzz:
281 ```sh
282 cargo install cargo-fuzz
283 cargo install honggfuzz
284 cargo +nightly fuzz run markdown_libfuzz
285 cargo hfuzz run markdown_honggfuzz
286 ```
287
288### Version
289
290`markdown-rs` follows [SemVer](https://semver.org).
291
292### Security
293
294The typical security aspect discussed for markdown is [cross-site scripting
295(XSS)][xss] attacks.
296Markdown itself is safe if it does not include embedded HTML or dangerous
297protocols in links/images (such as `javascript:`).
298`markdown-rs` makes any markdown safe by default,
299even if HTML is embedded or dangerous protocols are used,
300as it encodes or drops them.
301
302Turning on the `allow_dangerous_html` or `allow_dangerous_protocol` options for
303user-provided markdown opens you up to XSS attacks.
304
305Additionnally,
306you should be able to set `allow_any_img_src` safely.
307The default is to allow only `http:`, `https:`, and relative images,
308which is what GitHub does.
309But it should be safe to allow any value on `src`.
310
311The [HTML specification][whatwg-html-image] prohibits dangerous scripts in
312images and all modern browsers respect this and are thus safe.
313Opera 12 (from 2012) is a notable browser that did not respect this.
314
315An aspect related to XSS for security is syntax errors:
316markdown itself has no syntax errors.
317Some syntax extensions
318(specifically, only MDX)
319do include syntax errors.
320For that reason,
321`to_html_with_options` returns `Result<String, Message>`,
322of which the error is a struct indicating where the problem happened,
323what occurred,
324and what was expected instead.
325Make sure to handle your errors when using MDX.
326
327Another security aspect is DDoS attacks.
328For example,
329an attacker could throw a 100mb file at `markdown-rs`,
330in which case it’s going to take a long while to finish.
331It is also possible to crash `markdown-rs` with smaller payloads,
332notably when thousands of
333links, images, emphasis, or strong
334are opened but not closed.
335It is wise to cap the accepted size of input (500kb can hold a big book) and to
336process content in a different thread so that it can be stopped when needed.
337
338For more information on markdown sanitation,
339see
340[`improper-markup-sanitization.md`][improper] by [**@chalker**][chalker].
341
342### Contribute
343
344See [`contributing.md`][contributing] for ways to help.
345See [`support.md`][support] for ways to get help.
346See [`code-of-conduct.md`][coc] for how to communicate in and around this
347project.
348
349### Sponsor
350
351Support this effort and give back by sponsoring:
352
353* [GitHub Sponsors](https://github.com/sponsors/wooorm)
354 (personal; monthly or one-time)
355* [OpenCollective](https://opencollective.com/unified) or
356 [GitHub Sponsors](https://github.com/sponsors/unifiedjs)
357 (unified; monthly or one-time)
358
359### Thanks
360
361Special thanks go out to:
362
363* [Vercel][] for funding the initial development
364* [**@Murderlon**][murderlon] for the design of the logo
365* [**@johannhof**][johannhof] for the crate name
366
367## Related
368
369* [`micromark`][micromark]
370 — same as `markdown-rs` but in JavaScript
371* [`mdxjs-rs`][mdxjs-rs]
372 — wraps `markdown-rs` to *compile* MDX to JavaScript
373
374## License
375
376Original library license and copyright: [MIT][LICENSE.MIT] © [Titus Wormer][author].
377
378The [GPL v3][LICENSE.GPL-3.0] license applies only to my own changes [**@crashkeys.dev**][crashkeys].
379
380[badge-build-image]: https://github.com/wooorm/markdown-rs/workflows/main/badge.svg
381
382[badge-build-url]: https://github.com/wooorm/markdown-rs/actions
383
384[badge-coverage-image]: https://img.shields.io/codecov/c/github/wooorm/markdown-rs.svg
385
386[badge-coverage-url]: https://codecov.io/github/wooorm/markdown-rs
387
388[docs]: https://docs.rs/markdown/latest/markdown/
389
390[crate]: https://crates.io/crates/markdown
391
392[repo]: https://github.com/wooorm/markdown-rs
393
394[discussions]: https://github.com/wooorm/markdown-rs/discussions
395
396[commonmark]: https://spec.commonmark.org
397
398[cheat]: https://commonmark.org/help/
399
400[rust]: https://www.rust-lang.org
401
402[xss]: https://en.wikipedia.org/wiki/Cross-site_scripting
403
404[improper]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md
405
406[chalker]: https://github.com/ChALkeR
407
408[LICENSE.MIT]: LICENSE.MIT
409
410[LICENSE.GPL-3.0]: LICENSE.GPL-3.0
411
412[author]: https://wooorm.com
413
414[mdast]: https://github.com/syntax-tree/mdast
415
416[micromark]: https://github.com/micromark/micromark
417
418[mdxjs-rs]: https://github.com/wooorm/mdxjs-rs
419
420[mdast-util-from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown
421
422[vercel]: https://vercel.com
423
424[murderlon]: https://github.com/murderlon
425
426[johannhof]: https://github.com/johannhof
427
428[crashkeys]: https://tangled.org/did:plc:uuxsjzxf7wokfhwettyljhsa
429
430[contribute]: #contribute
431
432[sponsor]: #sponsor
433
434[extensions]: #extensions
435
436[security]: #security
437
438[test]: #test
439
440[contributing]: .github/contribute.md
441
442[support]: .github/support.md
443
444[coc]: .github/code-of-conduct.md
445
446[whatwg-html-image]: https://html.spec.whatwg.org/multipage/images.html#images-processing-model