mvols
mvolsβ€’3w ago

Batch insertions

Hey guys, quick question - whats the best way to bulk insert documents? Im scheduling a function to run every morning that hits an api that grabs pricing on about 19k documents & have a table to store the price for that day (looking to make a historical pricing for my app) Wondering tho what is the best way to go about it.. i have internal mutations that add one doc and currently looping over all the data batched at 250 each, and using the convex-helpers asyncMap like so:
await asyncMap(cardVariants, async (card) => {
await ctx.runMutation(
internal.cardPricing.mutations.internalInsertCardPriceDoc.default,
card
);
totalProcessed++;
});
await asyncMap(cardVariants, async (card) => {
await ctx.runMutation(
internal.cardPricing.mutations.internalInsertCardPriceDoc.default,
card
);
totalProcessed++;
});
Wondering how this all works when it comes to pricing if i need to call functions 19k times a day just for this function & if this is a good idea
27 Replies
jamwt
jamwtβ€’3w ago
hey! so this takes place in an action? fetching all the prices?
mvols
mvolsOPβ€’3w ago
Wow Jamie! Ive been watching all your videos on youtube - new to convex just starting up a side project now to use convex! Im drinking all the koolaid from all your videos and so far have loved working with (albeit my first week lol) but no not within one action. So currently i have an internalMutaion setup to insert one doc that ive been testing through the dashboard & running the function. now I have another internalAction that im using to fetch through the api & then call that intermalMutation The internalAction is set to loop over & pagination the query per 250 items from the api.
jamwt
jamwtβ€’3w ago
got it. yeah, in general, I'd do insertions in batches of 100-250 or whatever, depending on the document size using _.chunk or whatever (from lodash) and so the mutation would take a Doc<"table">[]
mvols
mvolsOPβ€’3w ago
doc size is very small, only about 7-10kb in size
jamwt
jamwtβ€’3w ago
and then you can asyncMap inside the mutation I wouldn't parallelize tons of mutations/promises within the action and have the mutations be single document. I think that would end up being slower and more expensive overall have the mutation be a batch writer and keep the batches around low-hundreds
mvols
mvolsOPβ€’3w ago
const trackCardPricing = internalAction({
handler: async (ctx) => {
try {
let page = 1;
let hasMorePages = true;
let totalCardVariants = 0;
let totalProcessed = 0;

console.log("starting to track card pricing...");

while (hasMorePages) {
const response = await fetchPokemontcgPrices(page);
const { data, page: currPage, count, totalCount } = response;

hasMorePages = currPage < totalCount / count;
page++;

const cardVariants: CardPriceDoc[] = [];

data.forEach((card) => {
const baseFields: Omit<CardPriceDoc, "variant"> = {
cardId: card.id,
cardName: card.name,
cardSetId: card.set.id,
isoSnapshotDate: transformApiDateToIsoDate(
card.tcgplayer.updatedAt
),
tcgplayerUrl: card.tcgplayer.url,
};

Object.entries(card.tcgplayer.prices).forEach(([variant, prices]) => {
cardVariants.push({
...baseFields,
variant,
...filterNullishValues(prices),
});
});
});

totalCardVariants += cardVariants.length;

await asyncMap(cardVariants, async (card) => {
await ctx.runMutation(
internal.cardPricing.mutations.internalInsertCardPriceDoc.default,
card
);
totalProcessed++;
});

console.log(
`Processed ${cardVariants.length} variants from page ${currPage} (Total processed: ${totalProcessed}/${totalCardVariants})`
);

await new Promise((resolve) => setTimeout(resolve, 100));
}

console.log(
`Finished tracking card pricing! Processed ${totalProcessed} total card variants`
);
} catch (error) {
console.error(`[trackCardPricing] ${error}`);
}
},
});
const trackCardPricing = internalAction({
handler: async (ctx) => {
try {
let page = 1;
let hasMorePages = true;
let totalCardVariants = 0;
let totalProcessed = 0;

console.log("starting to track card pricing...");

while (hasMorePages) {
const response = await fetchPokemontcgPrices(page);
const { data, page: currPage, count, totalCount } = response;

hasMorePages = currPage < totalCount / count;
page++;

const cardVariants: CardPriceDoc[] = [];

data.forEach((card) => {
const baseFields: Omit<CardPriceDoc, "variant"> = {
cardId: card.id,
cardName: card.name,
cardSetId: card.set.id,
isoSnapshotDate: transformApiDateToIsoDate(
card.tcgplayer.updatedAt
),
tcgplayerUrl: card.tcgplayer.url,
};

Object.entries(card.tcgplayer.prices).forEach(([variant, prices]) => {
cardVariants.push({
...baseFields,
variant,
...filterNullishValues(prices),
});
});
});

totalCardVariants += cardVariants.length;

await asyncMap(cardVariants, async (card) => {
await ctx.runMutation(
internal.cardPricing.mutations.internalInsertCardPriceDoc.default,
card
);
totalProcessed++;
});

console.log(
`Processed ${cardVariants.length} variants from page ${currPage} (Total processed: ${totalProcessed}/${totalCardVariants})`
);

await new Promise((resolve) => setTimeout(resolve, 100));
}

console.log(
`Finished tracking card pricing! Processed ${totalProcessed} total card variants`
);
} catch (error) {
console.error(`[trackCardPricing] ${error}`);
}
},
});
so this is what my current internalAction looks like
jamwt
jamwtβ€’3w ago
gotcha. if the page size is like 250, and the documents are just a few kb, that's probably a reasonable batch size for the mutation as well you could just send the whole batch to one mutation and do the asyncMap within it
mvols
mvolsOPβ€’3w ago
is doing something like this daily expensive through Convex? in terms of the function calls - trying to grasp the pricing model & do's & don'ts for convex ya know
jamwt
jamwtβ€’3w ago
calling on a document by document basis from actions -> mutations creates a lot of isolates in your convex deployment and so is slower and more heavyweight yeah, it is b/c we spin up separate v8 isolates and database transactions for each document it's okay to put the whole batch in one mutation for a batch this size
mvols
mvolsOPβ€’3w ago
hm, okay do you have any docs I can read through because to be honest im not sure i understand in one mutation if theres no like bulk insert oh you know what, i miss spoke early too, so currently right now, with how this internalAction is setup, theres more like 500-750 documents to be inserted into the db per page from the query which would happen within the asyncMap thats calling the internalMutation to insert one doc at a time
jamwt
jamwtβ€’3w ago
so, here's a thread about batch insertions: https://discord.com/channels/1019350475847499849/1329906799381577738/1329906799381577738 generally, if you just create a bunch of promises the convex runtime will use the appropriate amount of concurrency to speed things up w/out overloading the database if anything is 500-750, that gets to be a little too much in my book. I'd prefer to keep the batches around 100-250. so you can slice those batches up using .slice() or _.chunk as above
mvols
mvolsOPβ€’3w ago
Yup yup for sure, will look to do that - im curious tho - how come no built in way for bulk insertions?
jamwt
jamwtβ€’3w ago
well, b/c convex mutations are almost like "database machine code", if we provided a bulk insertion function, it would probably just wrap the thing above. we might do that at some point, or convex-helpers might include one? but it's also no more complicated than the code I linked, so you could also just make one yourself if you want to eliminate code the slightly longer answer is, in general the model isn't like a bunch of things are all written to the end of the file; documents may be spread out upon multiple machines/shards, etc--and the database will make the best decision it can about concurrency and placement and things like that as your project scales up, and convex adds more and more internal, automatic scaling so batch insertions are really just a desire to run a bunch of individual insertion operations -- there's nothing different under the covers about the batch the truth is all mutations write in "batch" right now due to the way mutations work
mvols
mvolsOPβ€’3w ago
ah okay okay - so in terms of the pricing model - anytime a function is called thats how the pricing is computed right? (don't need exactly just a rough estimate) So it would call the internalAction once (1 fn call) then call the internalMutation (take the higher end to be safe of 750 * 75(number of pages im going through the api) which is round up to 60k function calls. Am i right in this in terms of how these function calls work?
jamwt
jamwtβ€’3w ago
internalAction = one function call, yeah. but then if you're passing arrays of 100 documents to the internal mutation, it would be ~600 function calls
jamwt
jamwtβ€’3w ago
GitHub
fastest-roundest-pokemon/convex/pokemon.ts at 14c0087c08aa057d0231d...
Making t3dotgg's roundest pokemon app go fast. Contribute to jamwt/fastest-roundest-pokemon development by creating an account on GitHub.
jamwt
jamwtβ€’3w ago
1 action passes batches of 100 to the mutation
mvols
mvolsOPβ€’3w ago
omg get outta here this is for pokemon.. thats literally what im doing for my side project πŸ˜† im working on my own historical pricing chart LOL so this is storing pricing on each card!
jamwt
jamwtβ€’3w ago
nice. πŸ™‚ but in short, the only change to your approach here in practice I'd recommend for both costs and performance would be change your mutation to do 100 at a time or something. otherwise, looks good
mvols
mvolsOPβ€’3w ago
hm okay i think im grasping better now - so your saying in the internalMutation handle inserting an array of docs
/**
* Internal mutation to insert a card price document
* - runs validation checks on the input arguments
* - checks for uniqueness
*/
const internalInsertCardPriceDoc = internalMutation({
args: zodToConvex(cardPriceSchema),
handler: async (ctx, args) => {
const result = validateCardPriceArgs(args);
if (result.error) {
const errors = result.error.issues
.map((issue) => issue.message)
.join(", ");
throw new Error(
`[internalInsertCardPriceDoc] ${errors}, args:${JSON.stringify(args)}`
);
}

const existingCard = await ensureUniqueCardPriceDoc(ctx, args);
if (existingCard) {
throw new Error(
`[internalInsertCardPriceDoc] card pricing already exists, doc:${existingCard._id}, args:${JSON.stringify(
args
)}`
);
}

return ctx.db.insert("cardPricing", args);
},
});

export default internalInsertCardPriceDoc;
/**
* Internal mutation to insert a card price document
* - runs validation checks on the input arguments
* - checks for uniqueness
*/
const internalInsertCardPriceDoc = internalMutation({
args: zodToConvex(cardPriceSchema),
handler: async (ctx, args) => {
const result = validateCardPriceArgs(args);
if (result.error) {
const errors = result.error.issues
.map((issue) => issue.message)
.join(", ");
throw new Error(
`[internalInsertCardPriceDoc] ${errors}, args:${JSON.stringify(args)}`
);
}

const existingCard = await ensureUniqueCardPriceDoc(ctx, args);
if (existingCard) {
throw new Error(
`[internalInsertCardPriceDoc] card pricing already exists, doc:${existingCard._id}, args:${JSON.stringify(
args
)}`
);
}

return ctx.db.insert("cardPricing", args);
},
});

export default internalInsertCardPriceDoc;
jamwt
jamwtβ€’3w ago
yeah, basically just call this in a loop over an array of docs and cross the action -> mutation boundary 100x less times
mvols
mvolsOPβ€’3w ago
so here im only inserting 1 each time, but if i bump this to say take in an array of 200 cards, then the call to this internalMutation would only count as one per 200 cards? even tho we are calling ctx.db.insert for each single card (lets just say at the end of this function daily, i would be inserting 100k docs to keep numbers easy
jamwt
jamwtβ€’3w ago
So it’d be around 1k calls
mvols
mvolsOPβ€’3w ago
damn you a mad genius thats much more palletable you think right to run each day. so the ctx.db.insert or ctx.db.query doesn't count towards function calls monthly it sounds like? just the actual internalActions or query from the client side? so within that insert function, im doing this to check if its already been added:
export async function ensureUniqueCardPriceDoc(
ctx: QueryCtx,
args: CardPriceDoc
): Promise<Doc<"cardPricing"> | null> {
const { cardId, isoSnapshotDate, variant } = args;

return ctx.db
.query("cardPricing")
.withIndex("unique_card_id_and_snapshot_date_and_variant", (q) =>
q
.eq("cardId", cardId)
.eq("isoSnapshotDate", isoSnapshotDate)
.eq("variant", variant)
)
.unique();
}
export async function ensureUniqueCardPriceDoc(
ctx: QueryCtx,
args: CardPriceDoc
): Promise<Doc<"cardPricing"> | null> {
const { cardId, isoSnapshotDate, variant } = args;

return ctx.db
.query("cardPricing")
.withIndex("unique_card_id_and_snapshot_date_and_variant", (q) =>
q
.eq("cardId", cardId)
.eq("isoSnapshotDate", isoSnapshotDate)
.eq("variant", variant)
)
.unique();
}
so this being within the internalMutation would still count as only 1 fn call?
jamwt
jamwtβ€’3w ago
only ctx.runMutation or runAction or etc are convex functions, they're the only "function calls" we charge you for. database operations ctx.db.X are not function calls, yeah
mvols
mvolsOPβ€’3w ago
dude get outta here, your kidding.. this has got to be one of the best (and coolest) services ive used to date! LOVE how this is all in TS and how unbelievably cool the dashboard is when writing/testing this stuff!
jamwt
jamwtβ€’3w ago
glad to hear it!

Did you find this page helpful?