several issues with current markup situation
Related Issues:
- #197 - org-mode support
- #304 - image blob usage in markdown text
- #361 - issue/pr cross-ref syntax
- #382 - markdown commit reference
I recommend reading all related issues before discussing.
markup format is not extensible from lexicon#
Current lexicon definition doesn't specify the markup format. Right now, we only support blessed, tangled-specific markdown variant. But in future, we want to support custom syntax like org-mode requested in #197.
markdown facets#
It is pretty common to reference objects like user, issue, pull, repository, blob, or even git commits via markdown. And if someone reference something, we want that reference to be permanent.
For example, if alice referenced bob as @bob.tngl.sh and bob changed its handle to something else, @bob.tngl.sh should still point to same user (bob). We currently include mentioned/referenced identities in record to invalidate the legacy link, but this isn't enough. bluesky uses app.bsky.richtext.facet to embed resolved metadata to rich text, but its hard to adopt same solution because we need to apply byte-wise facets to a markup language. Byte-wise facets is quite doable for markdown variants or djot, but I assume not all markup language/parser will allow this.
Proposal#
Introduce sh.tangled.markup lexicon.
sh.tangled.markup#markdown#
Represent title/body text of issue/pull/comment.[1]
Both lexicons has two fields:
- text (raw text)
- refMap (uri -> item map)
refMap will map any uri used in text to resolved identifier like did, at-uri or blob.
For maximum extensibility, it would be better to make key (uri) to be extensible too.
Honestly I'm not satisfied with my own solution, but I think we do need some kind of dedicated lexicon to represent the markup content instead of using raw string type.
I'm open to more thoughts.
[1]: Title might use sh.tangled.markup#markdown_inline instead to be more specific
i am open to the idea of defining a rich markdown facet-y lexicon for our use case. it is quite an undertaking to represent a markdown AST as a lexicon and the usefulness is questionable, given that other implementors need to be able to lower markdown AST into the lexicon AST. but we can be sure that issues/comments render identically on all tangled appviews.
one reason to prefer raw string markdown might be: other rendered content such as README files are plain text, any alternate appview would need to understand how to render plaintext markdown anyway.