Batch insertions
Hey guys, quick question - whats the best way to bulk insert documents? Im scheduling a function to run every morning that hits an api that grabs pricing on about 19k documents & have a table to store the price for that day (looking to make a historical pricing for my app)
Wondering tho what is the best way to go about it.. i have internal mutations that add one doc and currently looping over all the data batched at 250 each, and using the convex-helpers
asyncMap
like so:
Wondering how this all works when it comes to pricing if i need to call functions 19k times a day just for this function & if this is a good idea27 Replies
hey!
so this takes place in an action? fetching all the prices?
Wow Jamie! Ive been watching all your videos on youtube - new to convex just starting up a side project now to use convex! Im drinking all the koolaid from all your videos and so far have loved working with (albeit my first week lol)
but no not within one action.
So currently i have an
internalMutaion
setup to insert one doc that ive been testing through the dashboard & running the function.
now I have another internalAction
that im using to fetch through the api & then call that intermalMutation
The internalAction
is set to loop over & pagination the query per 250 items from the api.got it. yeah, in general, I'd do insertions in batches of 100-250 or whatever, depending on the document size
using
_.chunk
or whatever (from lodash)
and so the mutation would take a Doc<"table">[]
doc size is very small, only about 7-10kb in size
and then you can asyncMap inside the mutation
I wouldn't parallelize tons of mutations/promises within the action and have the mutations be single document. I think that would end up being slower and more expensive overall
have the mutation be a batch writer and keep the batches around low-hundreds
so this is what my current
internalAction
looks likegotcha. if the page size is like 250, and the documents are just a few kb, that's probably a reasonable batch size for the mutation as well
you could just send the whole batch to one mutation and do the asyncMap within it
is doing something like this daily expensive through Convex? in terms of the function calls - trying to grasp the pricing model & do's & don'ts for convex ya know
calling on a document by document basis from actions -> mutations creates a lot of isolates in your convex deployment and so is slower and more heavyweight
yeah, it is
b/c we spin up separate v8 isolates and database transactions for each document
it's okay to put the whole batch in one mutation
for a batch this size
hm, okay do you have any docs I can read through because to be honest im not sure i understand in one mutation if theres no like bulk insert
oh you know what, i miss spoke early too, so currently right now, with how this
internalAction
is setup, theres more like 500-750 documents to be inserted into the db per page from the query
which would happen within the asyncMap
thats calling the internalMutation
to insert one doc at a timeso, here's a thread about batch insertions: https://discord.com/channels/1019350475847499849/1329906799381577738/1329906799381577738
generally, if you just create a bunch of promises the convex runtime will use the appropriate amount of concurrency to speed things up w/out overloading the database
if anything is 500-750, that gets to be a little too much in my book. I'd prefer to keep the batches around 100-250. so you can slice those batches up using
.slice()
or _.chunk
as aboveYup yup for sure, will look to do that - im curious tho - how come no built in way for bulk insertions?
well, b/c convex mutations are almost like "database machine code", if we provided a bulk insertion function, it would probably just wrap the thing above. we might do that at some point, or convex-helpers might include one? but it's also no more complicated than the code I linked, so you could also just make one yourself if you want to eliminate code
the slightly longer answer is, in general the model isn't like a bunch of things are all written to the end of the file; documents may be spread out upon multiple machines/shards, etc--and the database will make the best decision it can about concurrency and placement and things like that as your project scales up, and convex adds more and more internal, automatic scaling
so batch insertions are really just a desire to run a bunch of individual insertion operations -- there's nothing different under the covers about the batch
the truth is all mutations write in "batch" right now due to the way mutations work
ah okay okay - so in terms of the pricing model - anytime a function is called thats how the pricing is computed right? (don't need exactly just a rough estimate)
So it would call the internalAction once (1 fn call) then call the internalMutation (take the higher end to be safe of 750 * 75(number of pages im going through the api) which is round up to 60k function calls.
Am i right in this in terms of how these function calls work?
internalAction = one function call, yeah. but then if you're passing arrays of 100 documents to the internal mutation, it would be ~600 function calls
here's some code where I did this recently: https://github.com/jamwt/fastest-roundest-pokemon/blob/14c0087c08aa057d0231d72659877036a4b5de35/convex/pokemon.ts#L116
GitHub
fastest-roundest-pokemon/convex/pokemon.ts at 14c0087c08aa057d0231d...
Making t3dotgg's roundest pokemon app go fast. Contribute to jamwt/fastest-roundest-pokemon development by creating an account on GitHub.
1 action passes batches of 100 to the mutation
omg get outta here this is for pokemon.. thats literally what im doing for my side project π
im working on my own historical pricing chart LOL so this is storing pricing on each card!
nice. π
but in short, the only change to your approach here in practice I'd recommend for both costs and performance would be change your mutation to do 100 at a time or something. otherwise, looks good
hm okay i think im grasping better now - so your saying in the
internalMutation
handle inserting an array of docs
yeah, basically just call this in a loop
over an array of docs
and cross the action -> mutation boundary 100x less times
so here im only inserting 1 each time, but if i bump this to say take in an array of 200 cards, then the call to this
internalMutation
would only count as one per 200 cards? even tho we are calling ctx.db.insert
for each single card (lets just say at the end of this function daily, i would be inserting 100k docs to keep numbers easySo itβd be around 1k calls
damn you a mad genius
thats much more palletable you think right to run each day. so the
ctx.db.insert
or ctx.db.query
doesn't count towards function calls monthly it sounds like? just the actual internalActions
or query
from the client side?
so within that insert function, im doing this to check if its already been added:
so this being within the internalMutation
would still count as only 1 fn call?only
ctx.runMutation
or runAction or etc are convex functions, they're the only "function calls" we charge you for. database operations ctx.db.X
are not function calls, yeahdude get outta here, your kidding.. this has got to be one of the best (and coolest) services ive used to date! LOVE how this is all in TS and how unbelievably cool the dashboard is when writing/testing this stuff!
glad to hear it!