Nic
Nic3w ago

Really wish the database layer was a bit

Really wish the database layer was a bit more intuitive/functional to use, simple things like count() etc. I was working on a project that had 20k rows in a table, just wanted to show the count of rows. You have to fetch the entire dataset then .length it? Well that causes it to yell at you saying that the query's too big, and somehow i used 16GB worth of database bandwidth (had to upgrade to clear the account lockdown because of exceeding limits).
20 Replies
erquhart
erquhart3w ago
Yeah there are some things that are usually batteries-included that don't "just work" in Convex. For count the best approach is to use the Aggregate component, so you can do:
aggregate.count(ctx)
aggregate.count(ctx)
erquhart
erquhart3w ago
Convex
Aggregate
Keep track of sums and counts in a denormalized and scalable way.
Nic
NicOP3w ago
select("column1", "column2") would have been great too, maybe im just misreading what convex databases are supposed to be selecting certain columns would have cut down greatly on the database bandwidth used too
erquhart
erquhart3w ago
Not really - a sql select query is still taking whole rows, mapping down to the fields you selected, and then returning them to you. Since your Convex functions are running "inside" the database, you're just doing what sql select does manually when you map
Nic
NicOP3w ago
i guess that occurs on the database server (mysql for ex) not sent to the app to process?
erquhart
erquhart3w ago
right and your Convex code is running in the db
Nic
NicOP3w ago
interesting, but the table was iirc 200mb or so, just curious if you do a collect() on the whole table does that add 200mb to your bandwidth used just no idea how i got to 15gb, i deleted the project already because i didnt want any ghost processed to continue to do anything
erquhart
erquhart3w ago
Number of rows scanned depends on index use Oh you're talking about a specific op you did, missed that So getting a count on the spot does involve getting all records, yeah. The aggregate component basically keeps metadata to avoid needing to do that.
Nic
NicOP3w ago
my function was to grab a csv i imported, check the ids in the csv for duplicates in the database (so we dont add twice)
erquhart
erquhart3w ago
It initially has to go over to get a current count, but from there it keeps metadata up to date so it's not a big operation to get the count
jamwt
jamwt3w ago
maybe needed an index?
Nic
NicOP3w ago
so normally i'd just select distinct of the id, of 20k items index wouldnt do anything, since what im selecting would just return the whole database anyway whole table, rather so, do a collect() on table, pull the unique id thats also in the csv, process the csv to remove ones i already have saved, save the new results
jamwt
jamwt3w ago
so, why not just stream over the CSV and use an index to lookup if each record exists as you encounter them? especially once you have bigger datasets, you're not going to want to load the whole table convex is a "fail fast" database, just b/c it's designed to run you up to big big scale. so it sets you up for using O(log n) operations and memory efficient access out of the gate aggregates = use the aggregate component batch operations = use indexes and individual records
Nic
NicOP3w ago
wouldnt that just introduce 20k queries instead?
erquhart
erquhart3w ago
yeah we were initially talking count, aggregate for just that part but doesn't sound like you actually need a count
jamwt
jamwt3w ago
wouldnt that just introduce 20k queries instead?
yep. many small queries is happier in OLTP than big table scans
Nic
NicOP3w ago
that just hurts my brain lol maybe im just not visualizing how things are being processed at runtime im using convex for another project, and it's working amazingly might just not be the right fit for this one im building right now
jamwt
jamwt3w ago
export const insertMissingRecords = internalAction({
args: { records: v.array(v.object({ id: v.string() /* add other fields here */ })) },
handler: async (ctx, args) => {
const { records } = args;

const batchSize = 100;
for (let i = 0; i < records.length; i += batchSize) {
const batch = records.slice(i, i + batchSize);

const cursor = i + batchSize < records.length ? i + batchSize : undefined;

await ctx.runMutation(internal.yourModule.processBatch, {
batch,
cursor: cursor !== undefined ? cursor.toString() : undefined
});
}
},
});
export const processBatch = internalMutation({
args: {
batch: v.array(v.object({ id: v.string() /* add other fields here */ })),
},
handler: async (ctx, {batch}) => {

// Process each record in the batch
await Promise.all(batch.map(async (record) => {
// Check if record already exists
const existing = await ctx.db
.query("yourTableName")
.withIndex("by_id", q => q.eq("id", record.id))
.unique();

// Insert only if it doesn't exist
if (!existing) {
await ctx.db.insert("yourTableName", record);
}
}));
},
});
export const insertMissingRecords = internalAction({
args: { records: v.array(v.object({ id: v.string() /* add other fields here */ })) },
handler: async (ctx, args) => {
const { records } = args;

const batchSize = 100;
for (let i = 0; i < records.length; i += batchSize) {
const batch = records.slice(i, i + batchSize);

const cursor = i + batchSize < records.length ? i + batchSize : undefined;

await ctx.runMutation(internal.yourModule.processBatch, {
batch,
cursor: cursor !== undefined ? cursor.toString() : undefined
});
}
},
});
export const processBatch = internalMutation({
args: {
batch: v.array(v.object({ id: v.string() /* add other fields here */ })),
},
handler: async (ctx, {batch}) => {

// Process each record in the batch
await Promise.all(batch.map(async (record) => {
// Check if record already exists
const existing = await ctx.db
.query("yourTableName")
.withIndex("by_id", q => q.eq("id", record.id))
.unique();

// Insert only if it doesn't exist
if (!existing) {
await ctx.db.insert("yourTableName", record);
}
}));
},
});
I just asked kapa about this, so this is just from kapa. looks about right! ^ actually, that's too complicated. it can be simpler let me simplify it. silly ai just loop over records and insert missing ones keeps the transactions small so you're not taking massive database locks and slowing your app down
Nic
NicOP3w ago
ill look over it and revisit later, fresh eyes might help me here thanks for the discussion! learned quite a bit
jamwt
jamwt3w ago
no problem!

Did you find this page helpful?