Missing Primitives and Reactor architectural vision
Just a rundown as of $TODAYS DATE on where we are with regard to a few missing primitives and the overall architectural vision for Convex. Feel free to fire away with questions...
5 Replies
this is going to be advanced mode, under the covers stuff a bit. but we'll be sharing these things more and more with all our userbase moving forward for the curious and/or heavily invested
fundamentally, there is an underlying belief in convex that workflow is sort of a key part of how you store and change data. server-side reactivity in a way. when you start using convex, you can see these patterns start to emerge. and it's sort of a new way to think about generalizing backends. that's always been our intention. but we haven't quite given you all the pieces yet, just because of time and urgency of making other systems better "in the real world" and being able to work around modeling certain problems (albiet inelegantly)
there are two primary things still missing from the convex reactor primitives to finish the swing on being able to model just about any system
1. as asked by @Espen in #general , predicate- based or data- based triggering of actions or mutations
right now you can use the scheduler to explicitly schedule the execution of computation in the background (right now, later, or on some regular schedule)
but in someways the server-side is less powerful than the client right now
because the client can react to query changes! and the server-side doesn't give you any way to do that
this means you cannot express "subscriptions" to run an action on every item, or every item that matches a predicate
and that limits your ability to model things like queuing, and for us to provide you (eventually) with nice higher-level libraries that do this
or what people would normally think about traditional workflow, where there is a kind of handler responsible for moving any piece of data from
state A
to state B
, and you have graphs of these you can compose
right now the work around (that you see how up in Convex code a lot) is to have the mutator (the one setting state A, the one aware that there needs to be a queueing type behavior) do a scheduler.runAfter
to kick off the next link in the chain
however that means on the server we have something closer to e.g. jquery than react, where you're again "responsible" for tracking who cares about this data changing. eventually when you have a mutation which, say, three consumers care about, you have to sort of track and remember to update the mutator and so on and so on. this is a less powerful pattern than being able to just have the function that wants to react to the data changing locally express the subscription or predicate, and then convex calls it with the data
so this predicate/data based triggering is indeed on our roadmap
that's the first missing thing we haven't gotten around to building yet
2. scheduler returning some sort of id that represents the job in our system tables
right now, there is no way to get an id, probe the job's state, cancel the job, etc etc after initating something with the scheduler
that means you cannot build a library layer which allows you to change jobs to at-least-once, or create any other retry/timeout semantic to ensure a scheduled job actually runs when you're willing to ensure its idepotency
so practically speaking, at-most-once is the only real acheivable beahvior right now
and for things like migration libraries, easy queues, etc. that's too limitedyou can work around this by esentialy completely recreating the scheduler with your own system tables, but it's pretty janky since you cannot really reason about convex's intention to still run a background job and whether it's "done" or not. or whether it failed partially run. and so on. it's very hard to actually work around this in a complete way without a little more help from our scheduler. but Ian does a good job of giving you some ideas for now here: https://stack.convex.dev/background-job-management
Background Job Management
Implement asynchronous job patterns using a table to track progress. Fire-and-forget, cancelation, timeouts, and more.
anyway, with these two primitives written, the fundamentals would all be there. then, likely more and more of our team's work would be focusing on open-source higher-level libraries build on top of these primitives in collaboration with you all in our userbase as open, community projects
we can start focusing on higher level ways to express queues, workflow, migrations, job management libraries with great apis, and so on
a lot to do!
re: priorities and timeline, adoption has picked up quite a bit in the last few months, and we've been focusing more on more urgent needs of projects in production
anyway, if anyone has more questions on any of this, fire away!
Thanks for the detailed rundown. Could you go into a bit on how you do this tracking for the frontend part now, as it might make it a bit more clear on how temporary solutions can be implemented for the backend.
Also when you start figuring out this it would be awesome to have support that plays well with rxjs and streaming data. Or simply rethinking the apis a bit for example by giving super easy apis that makes it easy to either subscribe to a stream, subscribe to the events on changes, or simply subscribing to complete values. In lack of a better example, this would make it easy when working with openai it would be really easy to use one way to get updates to all the text so far for displaying, one to trigger pulses on updates, and a last one to only trigger when the entire response is ready for doing the next action. Simple apis. I'm sure there are things that could be done to make error handling even easier as when when sticking streaming state directly into the db, but I guess that might be outside the scope here. At least effect-ts is worth it to take a proper look at for some inspiration.
the tracking is using "read sets" internally within the convex reactor -- @sujayakar is going to release a nice write up soon on how convex works