Monorepo for Tangled tangled.org

proposal/discussion: extensible markup lexicon #383

open opened by boltless.me edited

several issues with current markup situation

Related Issues:

I recommend reading all related issues before discussing.

markup format is not extensible from lexicon#

Current lexicon definition doesn't specify the markup format. Right now, we only support blessed, tangled-specific markdown variant. But in future, we want to support custom syntax like org-mode requested in #197.

markdown facets#

It is pretty common to reference objects like user, issue, pull, repository, blob, or even git commits via markdown. And if someone reference something, we want that reference to be permanent.

For example, if alice referenced bob as @bob.tngl.sh and bob changed its handle to something else, @bob.tngl.sh should still point to same user (bob). We currently include mentioned/referenced identities in record to invalidate the legacy link, but this isn't enough. bluesky uses app.bsky.richtext.facet to embed resolved metadata to rich text, but its hard to adopt same solution because we need to apply byte-wise facets to a markup language. Byte-wise facets is quite doable for markdown variants or djot, but I assume not all markup language/parser will allow this.

Proposal#

Introduce sh.tangled.markup lexicon.

sh.tangled.markup#markdown#

Represent title/body text of issue/pull/comment.[1]

Both lexicons has two fields:

  • text (raw text)
  • refMap (uri -> item map)

refMap will map any uri used in text to resolved identifier like did, at-uri or blob. For maximum extensibility, it would be better to make key (uri) to be extensible too.


Honestly I'm not satisfied with my own solution, but I think we do need some kind of dedicated lexicon to represent the markup content instead of using raw string type.

I'm open to more thoughts.

[1]: Title might use sh.tangled.markup#markdown_inline instead to be more specific

oppi.li

i am open to the idea of defining a rich markdown facet-y lexicon for our use case. it is quite an undertaking to represent a markdown AST as a lexicon and the usefulness is questionable, given that other implementors need to be able to lower markdown AST into the lexicon AST. but we can be sure that issues/comments render identically on all tangled appviews.

one reason to prefer raw string markdown might be: other rendered content such as README files are plain text, any alternate appview would need to understand how to render plaintext markdown anyway.

directxman12.dev

having only an ast instead of a raw string makes it difficult to fix parsing errors or retroactively choose to support a new syntax on existing issues/etc, which seems suboptimal (e.g. suppose u wanted to begin auto-linkifying issue references, and have that work for all existing issues that use a supported syntax)

directxman12.dev edited

one other thing to consider is post-processing -- instead of using facets to, say, convert an @-reference to a did, have it be converted on submission to a did-link -- e.g. writing @directxman12.dev might get autolinkified as [did:plc:xyz] on saving, and the renderers are expected to resolve that reference link back to @directxman12.dev or whatever my current handle is at the time. this accomplishes much the same thing as facets here (u already have a markup language, so u don't get the bsky "no need for a markup parser" benefit), but means that implementers don't need to xref markdown ast with byte offsets as they render (possible, but annoying in many markdown libraries), and instead embed all the needed data inline.

for links, at least, this also doesn't require any markdown extensions, just reference link hooks, which is a feature already common in a number of parsers (e.g. pulldown-cmark calls this the "broken link callback")

generally , i do think being able to have different markup languages be supported would be nice, but i think instead of trying to do a 1-size-fits-all solution for things like x-refs, it's probably worth leaning on the language itself when u can. e.g. markdown has reference links, html has custom attributes, org-mode has link abbreviations, etc.

boltless.me (author)

@oppi.li to clarify, I'm not suggesting to use AST here. I know that is well, impossible. afaik the only way to fully support markdown spec is to not use AST and directly convert to HTML while parsing (I suppose goldmark package is not fully CommonMark/GFM compatible.)

What I'm suggesting here is:

  • use raw string to store markdown content, but with a wrapper type to specify which markup language this raw string is using (to support other formats like org-mode or djot in future)
  • embed the resolved uris to make markup link consistent. (mostly for user mentions)
boltless.me (author)

Or we can have resolvedDids map. Issue/pull references aren't necessary once we atprotate the issue/pull ids. So only thing we need to embed is the user did. the sh.tangled.markup#markdown object can hold handle -> did map which can be used to resolve mentioned user from their old handle. When appview parse the mentioned users from markdown content, it can prioritize this pre-resolved DIDs

sign up or login to add to the discussion
Labels

None yet.

area

None yet.

assignee

None yet.

Participants 3
AT URI
at://did:plc:xasnlahkri4ewmbuzly2rlc5/sh.tangled.repo.issue/3mctoic4vhe22