Curious if there is a best practices for
Curious if there is a best practices for fanning out in convex. As an example scenario, imagine you have an status update mutation. That then needs to trigger a notification to all followers, which is an action. I could see the following approaches:
1) status update mutation queries all followers (collects all or paginates, unsure if there is a benefit here) and schedules a the action for each follower.
2) status update mutation schedules a separate mutation (to have access to
ctx.scheduler
even though its just querying) that does the follower collection and notification action scheduling. This would make the status update mutation more transactional to its area of concern, but unsure if its worth breaking out, if approach above is performant enough.
3) Have that separate mutation work in paginated batches and reschedule itself to page through the follower result set until exhausted.
I'm sure some of this comes down to "how many followers are we dealing with?", but with that being an unknown, which is advised? Or a completely different approach I'm not thinking of?2 Replies
I'd typically go with #3 with a more traditional stack, but wondering if its over-engineered for convex.
My approach would probably be something like (3) for a heavy throughput application. If (1) means every action immediately needs to fetch data, it'd be more efficient to fetch it in batch. If you know what followers to execute for, you could schedule an action for a batch of ~100.
Adding tasks to the scheduler is more scalable than calling them directly.
For users with thousands of followers, take into account any rate limiting on the notification service. Centralizing the batching & therefore throughput rate will help you there. You could add sleeps in a single action where necessary.
(1) will definitely need to paginate (each page in a new mutation) once there's ~10k+ followers.
(2) sounds the same as number (1) except fetching followers in a new transaction. That does reduce the read dependencies of (1), but if users aren't changing often then not a big concern.
(3) self-scheduling in batches sounds good regardless if you're really looking at scale.
Pragmatically, I'd probably fetch the first 1k followers, then schedule a single action that notifies all 1k in batches. I'd print an error / notify a developer if it actually gets 1k back, so you know you need to scale. At that point, do the recursion trick but have 1:1 with action/mutation batches to control parallelism, and maybe even have the action schedule the next mutation batch, passing through the cursor so you have it execute serially to buffer for rate limiting