ian•13mo ago

Bandwidth usage for collaborative editing

I was thinking about delta-based protocols with our current subscription paradigm. If you have a sequence of edits that you apply to make a document, you could do something like: 1. Have multiple documents for the deltas - maybe one document per delta is ok, or maybe you batch them. 2. Have the query for the document be a paginated query, not a single query for the whole thing. 3. To render the document, the client fetches all pages, and automatically fetches new pages. The deltas -> doc transform can then happen on the client, and you can optimistically apply your own deltas without duplicating server logic. 4. As your document deltas grow, you can occasionally have a "snapshot" document that has the whole document as a single delta, so your "full doc deltas" query becomes a "all deltas since the last snapshot". You can make this snapshot after some threshold, when a user leaves the page, whatever heuristic makes sense. If you want to keep discussing, maybe we grab a #support-community thread? I know @RJ has worked deeply with things like this for his Scroll app which uses tiptap iirc

6 Replies

RJ•13mo ago

@dataloader, can you use OT instead of CRDTs? If so, I think you'll likely have a much easier time, and won't need to use anything other than Convex (as in, no need for a y-websocket server). In any case, Scroll (which uses OT) doesn't have the problem you're describing because the useQuery hook which watches for document changes only looks for the deltas (steps) since the latest authoritative version of the document known by the client, and then applies those steps client-side. If it's possible to accomplish the same thing with CRDTs, I imagine you'd want to store the deltas in their own table (this is what I did in Scroll), and then have a query which asks for "only the deltas I don't yet have". And when initially loading the document, you could just run a one-off (non-reactive) query. This is pretty much what Ian is suggesting above, I think.

dataloader•13mo ago

@RJ do you handle offline editing?

RJ•13mo ago

No, but it would be pretty straightforward to, I think. Just keep the latest authoritative version of the document and the result of sendableSteps (https://prosemirror.net/docs/ref/#collab.sendableSteps) —that is, the steps which have not yet been confirmed as applied by Convex—in IndexedDB or something. Try to load from there first if offline, and whenever you're online, sync the steps.

dataloader•13mo ago

I believe this is the point of CRDTs, make this behavior well-defined. Because i believe what you suggested would require a hand-crafted mitigation strategy when two offline sessions collide.

RJ•13mo ago

As long as you have a central authority, OT will handle this situation just fine as well (no hand-crafted mitigation strategy required). Consider that Google Docs and Notion both use OT for offline and collaborative editing. CRDTs have the advantage that they don't require a central authority, and the speed at which conflicts can be reconciled is not limited by the fact that every edit needs to run through a single server that determines their canonical ordering. But in practice, the number of concurrent edits required to seriously degrade OT performance is high enough that it hardly, if ever happens (for most use cases, at least--like collaborative text editing). And also, in practice, you probably have a central server that you want edits/documents to be running through anyway. And in my experience, at least, it was much easier to use and understand the ProseMirror OT library than it was Yjs. Back when I wrote Scroll, I initially tried to use Yjs instead and wanted to write a custom Convex provider, but couldn't find any good documentation on how to do so. All I found were some tales on the Discourse forum of people trying to build their own custom providers, running into esoteric issues because they were trying to implement an undocumented protocol, and asking for help. And over a year later, this doc looks exactly the same: https://docs.yjs.dev/tutorials/creating-a-custom-provider

dataloader•13mo ago

Thanks for your thoughts. By the way, the Yjs code is really, really simple. it hardly needs docs. you could just dig in

Bandwidth usage for collaborative editing

Did you find this page helpful?