zid
zid14mo ago

processing bulk/batch requests?

Setting up a daily cron job to update every document in a table. While I don't imagine this specific table to get beyond 1K documents, I'd still like to make this future-proof. Is there a time limit for a function call? Is there a way to process these in batches? What's the optimal approach here?
9 Replies
ian
ian14mo ago
You can use some of my migration helpers, or just look at their code to work on it in batches: https://stack.convex.dev/migrating-data-with-mutations
Migrating Data With Mutations
Using mutations to migrate data in Convex.
ian
ian14mo ago
the key is to use db.query(...).paginate(...) and keep track of the cursor to work on the next batch/page. You can even schedule a call recursively, passing in null for the cursor at the start, and going until it says there aren't any left
zid
zidOP14mo ago
Hey Ian, thank you very much, i'll be sure to check out the article! Just had a chance to review the article. Love it but just to confirm something. If something went wrong during a batch, I would either have to manually try again, or retry the batch using my own counter, and/or send an email/etc for me to then retry manually
ian
ian14mo ago
you could write the cursor at each batch to a table to keep track of its progress. I have it print the cursor out in the logs, so I can see where it failed and pass in the cursor before to have it pick up where it left off
zid
zidOP14mo ago
hmm, what would be the reason(s) a batch would fail during say, a daily scheduled execution? I assume all of the reasons would be due to something that happened inside of convex, like an outage of some sort? im asking to try and better understand how effective a programmatic solution would be where for example i would keep track of the cursor in a table and try to process the batch again say a maximum of 5 times or something. Is it the batch operation that tends to fail (whenever it does) or the wrapping function that fails, where in this case the logic inside would not matter as much. if a scheduled function fails, does convex try it again? do we have any control over this? from what i saw using a cron job, after it failed once, the next round of execution was at the next scheduled date
lee
lee14mo ago
on the contrary, scheduled mutations do not fail for reasons like outages. We have an exactly-once guarantee for scheduled mutations (but not scheduled actions). The mutation may fail if you throw an exception in js or perform an invalid operation like write a document that doesn't match your schema. In those cases you could say the migration is stuck on invalid data, and you might want to retry (after modifying data or code) from a cursor
zid
zidOP14mo ago
hmmm, what about the scheduled function timing out. For ex, what if we did no pagination during the scheduled mutation, and instead looped over a million documents?
lee
lee14mo ago
That would also qualify as a deterministic error, so it wouldn't be retried. Note you're more likely to get an explicit error about reading too many documents than an actual timeout. https://docs.convex.dev/functions/error-handling/#readwrite-limit-errors
Error Handling | Convex Developer Hub
There are four reasons why your Convex
zid
zidOP14mo ago
ah, got it, thank you

Did you find this page helpful?