Markdown parser fork with extended syntax for personal use.
at main 446 lines 12 kB view raw view rendered
1<p align="center"> 2 <br> 3 <img width="192" src="media/logo-chromatic.svg" alt=""> 4 <br> 5 <br> 6 <br> 7</p> 8 9# markdown-rs 10 11CommonMark compliant markdown parser in Rust with ASTs and extensions. 12 13## Feature highlights 14 15* [x] **[compliant][commonmark]** 16 (100% to CommonMark) 17* [x] **[extensions][]** 18 (100% GFM, 100% MDX, frontmatter, math) 19* [x] **[safe][security]** 20 (100% safe Rust, also 100% safe HTML by default) 21* [x] **[robust][test]** 22 (2300+ tests, 100% coverage, fuzz testing) 23* [x] **[ast][mdast]** 24 (mdast) 25 26## Links 27 28* [GitHub: `wooorm/markdown-rs`][repo] 29* [`crates.io`: `markdown`][crate] 30* [`docs.rs`: `markdown`][docs] 31 32## When should I use this? 33 34* if you *just* want to turn markdown into HTML (with maybe a few extensions) 35* if you want to do *really complex things* with markdown 36 37## What is this? 38 39`markdown-rs` is an open source markdown parser written in Rust. 40It’s implemented as a state machine (`#![no_std]` + `alloc`) that emits 41concrete tokens, 42so that every byte is accounted for, 43with positional info. 44The API then exposes this information as an AST, 45which is easier to work with, 46or it compiles directly to HTML. 47 48While most markdown parsers work towards compliancy with CommonMark (or GFM), 49this project goes further by following how the reference parsers (`cmark`, 50`cmark-gfm`) work, 51which is confirmed with thousands of extra tests. 52 53Other than CommonMark and GFM, 54this project also supports common extensions to markdown such as 55MDX, math, and frontmatter. 56 57This Rust crate has a sibling project in JavaScript: 58[`micromark`][micromark] 59(and [`mdast-util-from-markdown`][mdast-util-from-markdown] for the AST). 60 61P.S. if you want to *compile* MDX, 62use [`mdxjs-rs`][mdxjs-rs]. 63 64## Questions 65 66* to learn markdown, 67 see this [cheatsheet and tutorial][cheat] 68* for the API, 69 see the [crate docs][docs] 70* for questions, 71 see [Discussions][] 72* to help, 73 see [contribute][] or [sponsor][] below 74 75## Contents 76 77* [Install](#install) 78* [Use](#use) 79* [API](#api) 80* [Extensions](#extensions) 81* [Project](#project) 82 * [Overview](#overview) 83 * [File structure](#file-structure) 84 * [Test](#test) 85 * [Version](#version) 86 * [Security](#security) 87 * [Contribute](#contribute) 88 * [Sponsor](#sponsor) 89 * [Thanks](#thanks) 90* [Related](#related) 91* [License](#license) 92 93## Install 94 95With [Rust][] 96(rust edition 2018+, ±version 1.56+), 97install with `cargo`: 98 99```sh 100cargo add markdown 101``` 102 103## Use 104 105```rs 106fn main() { 107 println!("{}", markdown::to_html("## Hi, *Saturn*! 🪐")); 108} 109``` 110 111Yields: 112 113```html 114<h2>Hi, <em>Saturn</em>! 🪐</h2> 115``` 116 117Extensions (in this case GFM): 118 119```rs 120fn main() -> Result<(), markdown::message::Message> { 121 println!( 122 "{}", 123 markdown::to_html_with_options( 124 "* [x] contact ~Mercury~Venus at hi@venus.com!", 125 &markdown::Options::gfm() 126 )? 127 ); 128 129 Ok(()) 130} 131``` 132 133Yields: 134 135```html 136<ul> 137 <li> 138 <input checked="" disabled="" type="checkbox" /> 139 contact <del>Mercury</del>Venus at <a href="mailto:hi@venus.com">hi@venus.com</a>! 140 </li> 141</ul> 142``` 143 144Syntax tree ([mdast][]): 145 146```rs 147fn main() -> Result<(), markdown::message::Message> { 148 println!( 149 "{:?}", 150 markdown::to_mdast("# Hi *Earth*!", &markdown::ParseOptions::default())? 151 ); 152 153 Ok(()) 154} 155``` 156 157Yields: 158 159```text 160Root { children: [Heading { children: [Text { value: "Hi ", position: Some(1:3-1:6 (2-5)) }, Emphasis { children: [Text { value: "Earth", position: Some(1:7-1:12 (6-11)) }], position: Some(1:6-1:13 (5-12)) }, Text { value: "!", position: Some(1:13-1:14 (12-13)) }], position: Some(1:1-1:14 (0-13)), depth: 1 }], position: Some(1:1-1:14 (0-13)) } 161``` 162 163## API 164 165`markdown-rs` exposes 166[`to_html`](https://docs.rs/markdown/latest/markdown/fn.to_html.html), 167[`to_html_with_options`](https://docs.rs/markdown/latest/markdown/fn.to_html_with_options.html), 168[`to_mdast`](https://docs.rs/markdown/latest/markdown/fn.to_mdast.html), 169[`Options`](https://docs.rs/markdown/latest/markdown/struct.Options.html), 170and a few other structs and enums. 171 172See the [crate docs][docs] for more info. 173 174## Extensions 175 176`markdown-rs` supports extensions to `CommonMark`. 177These extensions are maintained in this project. 178They are not enabled by default but can be turned on with options. 179 180* GFM 181 * autolink literal 182 * footnote 183 * strikethrough 184 * table 185 * tagfilter 186 * task list item 187* MDX 188 * ESM 189 * expressions 190 * JSX 191* frontmatter 192* math 193 194It is not a goal of this project to support lots of different extensions. 195It’s instead a goal to support very common and mostly standardized extensions. 196 197## Project 198 199`markdown-rs` is maintained as a single monolithic crate. 200 201### Overview 202 203The process to parse markdown looks like this: 204 205```txt 206 markdown-rs 207+-------------------------------------------------+ 208| +-------+ +---------+--html- | 209| -markdown->+ parse +-events->+ compile + | 210| +-------+ +---------+-mdast- | 211+-------------------------------------------------+ 212``` 213 214### File structure 215 216The files in `src/` are as follows: 217 218* `construct/*.rs` 219 — CommonMark, GFM, and other extension constructs used in markdown 220* `util/*.rs` 221 — helpers often needed when parsing markdown 222* `event.rs` 223 — things with meaning happening somewhere 224* `lib.rs` 225 — public API 226* `mdast.rs` 227 — syntax tree 228* `parser.rs` 229 — turn a string of markdown into events 230* `resolve.rs` 231 — steps to process events 232* `state.rs` 233 — steps of the state machine 234* `subtokenize.rs` 235 — handle content in other content 236* `to_html.rs` 237 — turns events into a string of HTML 238* `to_mdast.rs` 239 — turns events into a syntax tree 240* `tokenizer.rs` 241 — glue the states of the state machine together 242* `unist.rs` 243 — point and position, used in mdast 244 245### Test 246 247`markdown-rs` is tested with the \~650 CommonMark tests and more than 1k extra 248tests confirmed with CM reference parsers. 249Then there’s even more tests for GFM and other extensions. 250These tests reach all branches in the code, 251which means that this project has 100% code coverage. 252Fuzz testing is used to check for things that might fall through coverage. 253 254The following bash scripts are useful when working on this project: 255 256* generate code (latest CM tests and Unicode info): 257 ```sh 258 cargo run --manifest-path generate/Cargo.toml 259 ``` 260* run examples: 261 ```sh 262 RUST_BACKTRACE=1 RUST_LOG=trace cargo run --example lib --features log 263 ``` 264* format: 265 ```sh 266 cargo fmt && cargo fix --all-features --all-targets --workspace 267 ``` 268* lint: 269 ```sh 270 cargo fmt --check && cargo clippy --all-features --all-targets --workspace 271 ``` 272* test: 273 ```sh 274 RUST_BACKTRACE=1 cargo test --all-features --workspace 275 ``` 276* docs: 277 ```sh 278 cargo doc --document-private-items --examples --workspace 279 ``` 280* fuzz: 281 ```sh 282 cargo install cargo-fuzz 283 cargo install honggfuzz 284 cargo +nightly fuzz run markdown_libfuzz 285 cargo hfuzz run markdown_honggfuzz 286 ``` 287 288### Version 289 290`markdown-rs` follows [SemVer](https://semver.org). 291 292### Security 293 294The typical security aspect discussed for markdown is [cross-site scripting 295(XSS)][xss] attacks. 296Markdown itself is safe if it does not include embedded HTML or dangerous 297protocols in links/images (such as `javascript:`). 298`markdown-rs` makes any markdown safe by default, 299even if HTML is embedded or dangerous protocols are used, 300as it encodes or drops them. 301 302Turning on the `allow_dangerous_html` or `allow_dangerous_protocol` options for 303user-provided markdown opens you up to XSS attacks. 304 305Additionnally, 306you should be able to set `allow_any_img_src` safely. 307The default is to allow only `http:`, `https:`, and relative images, 308which is what GitHub does. 309But it should be safe to allow any value on `src`. 310 311The [HTML specification][whatwg-html-image] prohibits dangerous scripts in 312images and all modern browsers respect this and are thus safe. 313Opera 12 (from 2012) is a notable browser that did not respect this. 314 315An aspect related to XSS for security is syntax errors: 316markdown itself has no syntax errors. 317Some syntax extensions 318(specifically, only MDX) 319do include syntax errors. 320For that reason, 321`to_html_with_options` returns `Result<String, Message>`, 322of which the error is a struct indicating where the problem happened, 323what occurred, 324and what was expected instead. 325Make sure to handle your errors when using MDX. 326 327Another security aspect is DDoS attacks. 328For example, 329an attacker could throw a 100mb file at `markdown-rs`, 330in which case it’s going to take a long while to finish. 331It is also possible to crash `markdown-rs` with smaller payloads, 332notably when thousands of 333links, images, emphasis, or strong 334are opened but not closed. 335It is wise to cap the accepted size of input (500kb can hold a big book) and to 336process content in a different thread so that it can be stopped when needed. 337 338For more information on markdown sanitation, 339see 340[`improper-markup-sanitization.md`][improper] by [**@chalker**][chalker]. 341 342### Contribute 343 344See [`contributing.md`][contributing] for ways to help. 345See [`support.md`][support] for ways to get help. 346See [`code-of-conduct.md`][coc] for how to communicate in and around this 347project. 348 349### Sponsor 350 351Support this effort and give back by sponsoring: 352 353* [GitHub Sponsors](https://github.com/sponsors/wooorm) 354 (personal; monthly or one-time) 355* [OpenCollective](https://opencollective.com/unified) or 356 [GitHub Sponsors](https://github.com/sponsors/unifiedjs) 357 (unified; monthly or one-time) 358 359### Thanks 360 361Special thanks go out to: 362 363* [Vercel][] for funding the initial development 364* [**@Murderlon**][murderlon] for the design of the logo 365* [**@johannhof**][johannhof] for the crate name 366 367## Related 368 369* [`micromark`][micromark] 370 — same as `markdown-rs` but in JavaScript 371* [`mdxjs-rs`][mdxjs-rs] 372 — wraps `markdown-rs` to *compile* MDX to JavaScript 373 374## License 375 376Original library license and copyright: [MIT][LICENSE.MIT] © [Titus Wormer][author]. 377 378The [GPL v3][LICENSE.GPL-3.0] license applies only to my own changes [**@crashkeys.dev**][crashkeys]. 379 380[badge-build-image]: https://github.com/wooorm/markdown-rs/workflows/main/badge.svg 381 382[badge-build-url]: https://github.com/wooorm/markdown-rs/actions 383 384[badge-coverage-image]: https://img.shields.io/codecov/c/github/wooorm/markdown-rs.svg 385 386[badge-coverage-url]: https://codecov.io/github/wooorm/markdown-rs 387 388[docs]: https://docs.rs/markdown/latest/markdown/ 389 390[crate]: https://crates.io/crates/markdown 391 392[repo]: https://github.com/wooorm/markdown-rs 393 394[discussions]: https://github.com/wooorm/markdown-rs/discussions 395 396[commonmark]: https://spec.commonmark.org 397 398[cheat]: https://commonmark.org/help/ 399 400[rust]: https://www.rust-lang.org 401 402[xss]: https://en.wikipedia.org/wiki/Cross-site_scripting 403 404[improper]: https://github.com/ChALkeR/notes/blob/master/Improper-markup-sanitization.md 405 406[chalker]: https://github.com/ChALkeR 407 408[LICENSE.MIT]: LICENSE.MIT 409 410[LICENSE.GPL-3.0]: LICENSE.GPL-3.0 411 412[author]: https://wooorm.com 413 414[mdast]: https://github.com/syntax-tree/mdast 415 416[micromark]: https://github.com/micromark/micromark 417 418[mdxjs-rs]: https://github.com/wooorm/mdxjs-rs 419 420[mdast-util-from-markdown]: https://github.com/syntax-tree/mdast-util-from-markdown 421 422[vercel]: https://vercel.com 423 424[murderlon]: https://github.com/murderlon 425 426[johannhof]: https://github.com/johannhof 427 428[crashkeys]: https://tangled.org/did:plc:uuxsjzxf7wokfhwettyljhsa 429 430[contribute]: #contribute 431 432[sponsor]: #sponsor 433 434[extensions]: #extensions 435 436[security]: #security 437 438[test]: #test 439 440[contributing]: .github/contribute.md 441 442[support]: .github/support.md 443 444[coc]: .github/code-of-conduct.md 445 446[whatwg-html-image]: https://html.spec.whatwg.org/multipage/images.html#images-processing-model