Monorepo for Tangled tangled.org

proposal: move db logic to ingester #405

open opened by boltless.me

All DB operations should be done from ingester, and handlers should only write to PDS.

Problems#

Duplicated write logic#

In some handlers, we write to PDS first, but in other handlers, we write to DB first. Then there is also ingester which listens to jetstream & writes to DB again. This is not good as we need to maintain two write operations. Service layer PR#800 can help this a bit by packing some domain logics, but write in http handler (DB+PDS) and write in ingester (DB) are different operations.

We can have writePDS flag for all service methods to reuse the logic though.

DB-first write doesn't make much sense#

Writing to DB doesn't help much on user request.

  • DB cannot validate the input (constraints like FK relationship can't work in atproto)
  • The DB operation is quicker than PDS, but handler should still wait for PDS operation to succeed and rollback everything on fail

Appview DB is mainly for indexing and backlinking. It should not be the source of truth.

PDS-first write also doesn't make much sense#

  • PDS operation is harder to rollback. We can, but it's objectively complicated than tx.Rollback().
  • We are basically duplicating ingester's job.

We need CID for records#

In some cases, we need to store CID returned by the PDS, especially when we use strongRef over at-uri as proposed from #387. Yes, we can pre-calculate the CID, but still, PDS should be the source of truth so pre-calculating CID will just increase the complexity.

Scalability#

Ideally we want tangled AppView to scale. If operation is done only from that region without dealing with the distributed DB state, it will be even faster.

Proposed Behavior#

  • All handlers will only write to PDS and return immediately without performing DB write
  • DB writes will mostly happen from the ingester
  • "delete" handler will delete all relevant records from PDS (e.g. "unstar" will delete all duplicated star records pointing to same repo)
  • "delete" event from ingester will remove that specific record from DB

Last two are already introduced from PR#993~#1000.


Though this cannot be done for non-decentralized records yet:

  • sh.tangled.repo.issue (issue-id is not included)
  • sh.tangled.repo.pull (pull-id missing & 'round' concept is not yet atprotated)
  • sh.tangled.repo.pull.comment (pull-id missing & 'round' concept is not yet atprotated)

Fixing these three records are out of scope from this proposal (working on it!).

so the proposal is to remove optimistic updates? not having optimistic updates will lead to poorer UX ultimately. i agree re: cons of optimistic updates, but the solution is not to remove it entirely.

in your proposal above, the UX will be like so:

  • star a repo, the PDS write happens: htmx updates the page to show that it is starred
  • refresh the page: the star state has been reverted to unstarred!
  • wait for 1-3s and then refresh the page: the appview has caught up with the firehose, the repo is starred once more

some alternatives:

  • option 1:
    • client is a JS app that writes to PDS directly, and optimistically updates its state, can use websockets to share error states here
    • api.tangled.org ingests from firehose only. we would need Read-After-Write style "PDS plugins" that keep the api.tangled.org data up to date
  • option 2:
    • client writes to PDS and appview DB and updates its state (much like our HTMX frontend), we can kick off the PDS write in the background here (as opposed to awaiting its success)
    • api.tangled.org ingests from the firehose, and uses a TTL to prune data that was written optimistically but did not arrive within N seconds
    • showcasing the error state here is a bit complex, but we can use websockets and toasts to indicate the change in state
sign up or login to add to the discussion
Labels

None yet.

area

None yet.

assignee

None yet.

Participants 2
AT URI
at://did:plc:xasnlahkri4ewmbuzly2rlc5/sh.tangled.repo.issue/3mdy4uko2vk22