tangled.org / core

Monorepo for Tangled tangled.org

discussion: off-protocol records #356

open

opened by

boltless.me 1 week ago

There are some floating ideas regarding to current limitation of atproto and additional requirements of tangled. I think most of all can agree that atproto itself is not enough for tangled's needs. Most easiest example would be shared-private records. Also, it's worth noting that we already have several off-protocol records in knotstream and spindlestream; sh.tangled.git.refUpdate, sh.tangled.pipeline and sh.tangled.pipeline.status.

So I think it would be quite reasonable for tangled to make our own protocol for off-protocol records (uhh sorry for my language skill limitations, hopefully you get the point.)

I don't have single concrete idea, but let me list my atomic thoughts.

Git refs#

Currently when git ref changes on knot via push, knot emits sh.tangled.git.refUpdate event records. First, this violates the lexicon style guidelines so it should be sh.tangled.git.ref.update. Now, it seems reasonable to just have sh.tangled.git.ref record and stream create/update/delete events.

We don't need to store git refs in json. We can fetch them from git object store on request.

Not sure if we can make atproto_pds compatible server to serve these imaginary records... But do we even need to?

Pipeline events#

Similar technique can be go for sh.tangled.pipeline and sh.tangled.pipeline.status records.

On event trigger, we can create sh.tangled.pipeline.workflow records for each jobs. This will be owned by the spindle¹ and will be stored in somewhere².
Each workflow will hold log UUID which can be used to request spindle for a real-time log stream.
There won't be sh.tangled.pipeline.status records. sh.tangled.pipeline.workflow itself will hold its one and only state and change on it will be broadcasted. No state prioritization needed. This way, it will be easier to garbage-collect broken workflow runs with timeout state and pipelines are now Sync 1.1 compatible. We can even backfill old pipelines!

Collaborative Records#

Before talking about shared-private records, this one needs some love. Honestly it deserves its own thread, but I'm leaving it here.

In git forge, it's pretty common to modify same object between multiple people. Like, that's whole point of the platform; collaboration.

Example of Collaborative Records:#

author and maintainers can modify the issue title
author and maintainers can change the issue state
author and maintainers can modify the PR branch (when author allowed)

To allow these operations, a record should be:

modifiable by arbitrary user with correct permission
versioned and fully traceable (we can track atproto records with CID, but Sync protocol doesn't support version history)

For issue/PR state, we already have a solution with standard atproto records by making sh.tangled.repo.{issue,pull}.state records. But same technique cannot be applied to title or other fields, that will be too complex. So it is ideal to manage all issue/PR state/content in-record just like how I'm suggesting to remove sh.tangled.pipeline.status.

Access Control System#

modifiable by arbitrary user with correct permission

Obvious following question would be "how can we check the permission for an operation?".

Right now, we don't have much permission system. Only author can modify stuffs. Even repo collaborators have pretty limited capabilities and we don't have a way to provide fine-grained control over each collaborators' permissions.

I can see two solutions for this:

Knot (or 'RepositoryServer' if we want to leave knot as just a git server) will store all permissions and all requests will proxy this server. For example, when a repo collaborator tries trigger the pipeline, the request will bypass the 'RepositoryServer' to spindle.
Introduce publishable permission rule spec. We can publish casbin permission in atmosphere so distributed services can follow. This reminds me how Leaf is just sending raw SQL queries between services.

Current rbac implementation is bit incorrect and there is a fix for it. Please see this commit or sl/spindle-rewrite branch for correct implementation.

Conclusion#

This situation makes me feel the need of one or two new protocols. Not sure if we can make it generic while implementing tangled-specific logics like access control.

Thankfully, at-uri by spec isn't restricted for /nsid/rkey pattern. at-uri format in lexicons are restricted, yes, but nothing is stopping us from defining new uri scheme like git+at://.

As I said, I don't have concrete idea yet, so I'm looking for others thoughts.

spindles already have a DID format: did:web:spindle.tangled.sh but currently it isn't even valid did doc and should be fixed. ↩︎
same to knot. Technically using atproto PDS might be possible, but not sure if we even need to follow PDS spec here. ↩︎

evan.jarrett.net 13d ago

I think Repo as DID would allow these records to live on protocol but allow for collaboration. A knot as pds is possible, and doesn’t need to fully comply with the standard pds implementation. It can have its own xrpc endpoints, just needs to implement just enough to connect to relays and get/sync records, blobs.

boltless.me (author) edited 12d ago

Thank you for sharing your thoughts!

I personally don't prefer to make a "semi-compatible PDS" here.

It is used for different needs and if we populate the #atproto_pds field, we basically limit the repo-dids from using standard atproto PDS in future.

One interesting idea came up from discord server when discussing did-for-repo concept was being able to use repository itself as a normal atproto identity (e.g. having bsky account.) If we embed custom logics below the AtprotoPersonalDataServer implementation, it can break the compatibility with other services.

So imo these "special records" should be stored in completely different atproto service:

{
  "did": "did:plc:repo-did",
  "services": {
    // custom atproto service
    "knot": {
      "type": "TangledKnot",
      "endpoint": "https://knot2.tangled.org"
    },
    // #atproto_pds can be any standard AtprotoPersonalDataServer
    "atproto_pds": {
      "type": "AtprotoPersonalDataServer",
      "endpoint": "https://woodear.us-west.host.bsky.network"
    }
  }
}

evan.jarrett.net 12d ago

Are you saying you would have an #atproto_pds AND #tangled_knot service types?

This would be the correct way to do it if you have custom xrpc/sh.tangled.knot/getGitRefs for example. This also allows you to store data in a separate storage implementation without writing them to the PDS.

I think I care less about what a "knot as PDS" or a "knot as custom atproto service" looks like, provided the Repo becomes an atproto DID account.

It can be both, but some other PDS can also host the repo accounts. I just think a PDS within the knot binary makes more sense to me.

vielle.dev 12d ago

(this got really lengthy really fast 😅 sorry)

imo git refs COULD be fine as just subscribeRepos endpoints but personally if theyre not accessible via normal atproto pds endpoints, they should probably just be an event stream. this would be a breaking change but since it would move from /events to /xrpc/sh.tangled.git.subscribeRefsor something, the/events` endpoint could be kept around for a while if anything depends on it (ik ive used it once but idk if anything else does? maybe spindles or other tangled infra, i havent looked too deep)

i havent messed w spindles much but an embeded pds could be useful since it would allow compatibility with tap but idk that this is needed

wrt private data i think it makes sense to wait to see how atproto private data forms and go from there

as far as shared data and access control, i dont think versioning is a hard requirement personally. it could be worth having some trusted third party store a list of cid history for a given aturi (but no data) and allow the op to store the data in their own repo. it would allow users to delete their data properly while avoiding malicious history edits (idk how clear that is lmk if i need to rephrase that)

i think collaborative data could be solved by did-for-repo (see not very thought out idea below)

the knot acts as an atproto pds with ownership of a did:plc for each repo (which contains an atproto pds field and aka as at://did:owner/sh.tangled.repo/rkey which is bidirectioal like handles).
pr/issue/etc flow could look like below (aturis included to help follow. would have actual tid rkeys)

a user creates a pr/issue for the repo. the record could look like:

// at://did:user:a/sh.tangled.data/original
{
    "history": {
        // note that `cid1` is the cid of the fields value (type, prev, and data v1) and not of the whole record.
        "cid1": { 
            "$type": "sh.tangled.data",
            "prev": null,
            /* data v1 */ 
        }
    } 
}

the knot sees this pr/issue (via firehose) and creates a record in its pds which could look like:

// at://did:knot:a/sh.tangled.knot.data/history
{ 
    "history": [
        { 
            "uri": "at://did:user:a/sh.tangled.data/original", 
            "cid": {"$link": "cid1"} 
        }
    ] 
}

the original user now edits their record. this performed by finding what fields changed between versions, stripping any fields that are the same, and updating their original record like so:

// at://did:user:a/sh.tangled.data/original
{
    "history": {
        "cid1": { 
            "$type": "sh.tangled.data",
            "prev": null,
            /* data v1 */ 
        },
        "cid2": {
            "$type": "sh.tangled.data.override",
            "prev": {
                "uri": "at://did:user:a/sh.tangled.data/original",
                "cid": {"$link": "cid1"}
            },
            /* data v2 */
        }
    }
}

the knot recives the update event and edits the history array like so:

// at://did:knot:a/sh.tangled.knot.data/history
{ 
    "history": [
        { 
            "uri": "at://did:user:a/sh.tangled.data/original", 
            "cid": {"$link": "cid1"} 
        },
        { 
            "uri": "at://did:user:a/sh.tangled.data/original", 
            "cid": {"$link": "cid2"} 
        }
    ] 
}

the repo owner edits the record. this creates a record in their pds, following the same flow as the op editing the record

// at://did:user:b/sh.tangled.data/override
{
    "history": {
        "cid3": {
            "$type": "sh.tangled.data.override",
            "prev": {
                "uri": "at://did:user:a/sh.tangled.data/original",
                "cid": {"$link": "cid2"}
            },
            /* data v3 */
        }
    }
}

the knot recives the create event, checks if the user has sufficient permissions, and updates the history array like so:

// at://did:knot:a/sh.tangled.knot.data/history
{ 
    "history": [
        { 
            "uri": "at://did:user:a/sh.tangled.data/original", 
            "cid": {"$link": "cid1"} 
        },
        { 
            "uri": "at://did:user:a/sh.tangled.data/original", 
            "cid": {"$link": "cid2"} 
        },
        { 
            "uri": "at://did:user:b/sh.tangled.data/override", 
            "cid": {"$link": "cid3"} 
        }
    ] 
}

etc etc

The knot should reject overrides that cause a forked history, and appviews should ignore overrides or OPs not in the history array
If the OP is deleted, the knot should probably delete the history entry
If overrides are deleted, appviews could fall back to previous versions or just show [deleted]

Repo metadata could be a standard record in the knot which can be edited by users with sufficient permissions (using standard pds endpoints but with interservice auth)

alternate pr/issue/etc flow:

User creates a pr in their repo

// at://did:user/sh.tangled.data/original
{
    "$type": "sh.tangled.data",
    /* data v1 */
}

The knot replicates this like so:

// at://did:knot/sh.tangled.data/other
{
    "$type": "sh.tangled.data",
    "source": {
        "uri": "at://did:user/sh.tangled.data/original",
        "cid": {"$link": "original-cid"} // to detect if it was edited, not to break links
    },
    /* data v1 */
}

User edits the data in their repo. This change is copied in the knots pds. Note that the source field remains unchanged.
A repo admin edits the data. This is done via inter service auth (like repo metadata). The source field is not changed
The user deletes their record. The knot updates the source field to be null but otherwise keeps the data (not sold on this one 🤷 but is similar to how gh etc do things)

This makes user owned data like prs and issues not really user owned so idk how i feel about that but these are off the cuff thoughts so 🤷

also i dont really have strong feelings on permissions except if tangled use the current model they should probably just be records and atproto oauth permissionsets/scopes are probably a good place to look for inspo

boltless.me (author) 12d ago

Thank you for sharing your thoughts!

I can also see that this is technically possible with atproto by making some abstract layer on top of those records, and that was my original proposal and that's similar to how current sh.tangled.*.state solution works.

But we will have custom PDS implementation for did-for-repo to handle the access control logic, and if so, do we even need to store them in atproto records at all?

Not being able to collaborative is clearly the current limitation of atproto. Yes, we can fix that by using CID and soft-links as you mentioned, but I'd ask why are we using atproto records at first place if it is limited and we can just make our own protocol working sideway?

evan.jarrett.net 12d ago

What if the initial post title and description to an issue/pull isn't owned by the user that created it? The starting "thread" could be written to the did:knot, and following comments made on it are then owned by the users? I think that essentially solves the duplicate records functionality, with the same trade off of you don't own the initial issue/pr you wrote.

I had the same thought about looking at oauth permission sets, but couldn't figure out how that would work

vielle.dev 11d ago

@boltless.me

i think theres a few main benefits to mostly copying atproto records instead of some bespoke method, that being:

somewhat simpler mental model; once you learn how atproto records work its only a tweak at the auth layer and you understand how they work in knots. this isnt as big an issue if the knot data model is mostly the same tho
all the existing tooling; you get things like tap and atp libraries that work pretty much out of the box. if something needs auth you might need to fork it, but you can also just use service proxying if the library supports that
also we'd get a pretty much readymade private data implementation if atproto's private data works for tangled, whereas a custom impl might need some work to get it there (again, less of an issue if the data model is mostly the same)

im not entirely opposed to a bespoke tangled protocol but i dont think its worth it unless it allows for things that just cannot be done with atp or would have strong ux improvements. as it stands i personally just dont see it lol

vielle.dev edited 10d ago

@evan.jarrett.net

wrt writing the starting thread to the did:knot i think that could work, although id probably want a sort of stub record on the users pds? like { "$type": "sh.tangled.data.stub", "rel": "at://did:knot/sh.tangled.data/other" }, which is then needed to create the sh.tangled.data record on the pds (the req would be denied if its not present), and that stub record would be linked to prove they made the op AND allows them to delete their ownership of the data, which the knot could listen to to null out the source field

wrt permissions, it could look something like having an sh.tangled.permission lexicon which then contains records of rkeys mapping to a did. (ex: at://did:knot/sh.tangled.permission/did:user:here). all did:plc and did:web dids can be represented as an rkey, if a new did method uses a disallowed character, simply use percent encoding. the contexts of the record could look something like this:

{
  "$type": "sh.tangled.permission",
  "permissions": [
    "repo:sh.tangled.data?action=update", // create for this data is not needed; this user is not allowed to delete
    "repo:sh.tangled.repo.description?action=update",
    "repo:app.bsky.feed.post" // generalized usecase; may not be implemented by tangled but could be useful elsewhere
  ]
}

It would probably want to include repo, blob, and maybe identity. account and rpc seem like they wouldnt be very useful for knots, although if this is made to be generic (not just tangled knots) i dont see why we should spec them out. repo should disallow wildcard and sh.tangled.permission (since sh.tangled.permission is the same as wildcard as you can give yourself the permissions)

there should also be a transition:generic permission, that gives full repo, blob, identity, and account permissions. not sure if this permission should be restricted to current repo owner (since i assume repos would be refered to the same way they are now)

should this be in sh.tangled namespace? it feels generic enough it could/should live elsewhere, maybe community.lexicon or com.atproto? worth discussing i think

Labels

None yet.

area

None yet.

assignee

None yet.

Participants 3

AT URI

at://did:plc:xasnlahkri4ewmbuzly2rlc5/sh.tangled.repo.issue/3mb2vimmzi222