Nic•5mo ago

Really wish the database layer was a bit

Really wish the database layer was a bit more intuitive/functional to use, simple things like count() etc. I was working on a project that had 20k rows in a table, just wanted to show the count of rows. You have to fetch the entire dataset then .length it? Well that causes it to yell at you saying that the query's too big, and somehow i used 16GB worth of database bandwidth (had to upgrade to clear the account lockdown because of exceeding limits).

20 Replies

erquhart•5mo ago

Yeah there are some things that are usually batteries-included that don't "just work" in Convex. For count the best approach is to use the Aggregate component, so you can do:

aggregate.count(ctx)

aggregate.count(ctx)

erquhart•5mo ago

https://www.convex.dev/components/aggregate

Convex

Aggregate

Keep track of sums and counts in a denormalized and scalable way.

NicOP•5mo ago

select("column1", "column2") would have been great too, maybe im just misreading what convex databases are supposed to be selecting certain columns would have cut down greatly on the database bandwidth used too

erquhart•5mo ago

Not really - a sql select query is still taking whole rows, mapping down to the fields you selected, and then returning them to you. Since your Convex functions are running "inside" the database, you're just doing what sql select does manually when you map

NicOP•5mo ago

i guess that occurs on the database server (mysql for ex) not sent to the app to process?

erquhart•5mo ago

right and your Convex code is running in the db

NicOP•5mo ago

interesting, but the table was iirc 200mb or so, just curious if you do a collect() on the whole table does that add 200mb to your bandwidth used just no idea how i got to 15gb, i deleted the project already because i didnt want any ghost processed to continue to do anything

erquhart•5mo ago

Number of rows scanned depends on index use Oh you're talking about a specific op you did, missed that So getting a count on the spot does involve getting all records, yeah. The aggregate component basically keeps metadata to avoid needing to do that.

NicOP•5mo ago

my function was to grab a csv i imported, check the ids in the csv for duplicates in the database (so we dont add twice)

erquhart•5mo ago

It initially has to go over to get a current count, but from there it keeps metadata up to date so it's not a big operation to get the count

jamwt•5mo ago

maybe needed an index?

NicOP•5mo ago

so normally i'd just select distinct of the id, of 20k items index wouldnt do anything, since what im selecting would just return the whole database anyway whole table, rather so, do a collect() on table, pull the unique id thats also in the csv, process the csv to remove ones i already have saved, save the new results

jamwt•5mo ago

so, why not just stream over the CSV and use an index to lookup if each record exists as you encounter them? especially once you have bigger datasets, you're not going to want to load the whole table convex is a "fail fast" database, just b/c it's designed to run you up to big big scale. so it sets you up for using O(log n) operations and memory efficient access out of the gate aggregates = use the aggregate component batch operations = use indexes and individual records

NicOP•5mo ago

wouldnt that just introduce 20k queries instead?

erquhart•5mo ago

yeah we were initially talking count, aggregate for just that part but doesn't sound like you actually need a count

jamwt•5mo ago

wouldnt that just introduce 20k queries instead?

yep. many small queries is happier in OLTP than big table scans

NicOP•5mo ago

that just hurts my brain lol maybe im just not visualizing how things are being processed at runtime im using convex for another project, and it's working amazingly might just not be the right fit for this one im building right now

jamwt•5mo ago

export const insertMissingRecords = internalAction({
  args: { records: v.array(v.object({ id: v.string() /* add other fields here */ })) },
  handler: async (ctx, args) => {
    const { records } = args;

    const batchSize = 100;
    for (let i = 0; i < records.length; i += batchSize) {
      const batch = records.slice(i, i + batchSize);

      const cursor = i + batchSize < records.length ? i + batchSize : undefined;

      await ctx.runMutation(internal.yourModule.processBatch, { 
        batch, 
        cursor: cursor !== undefined ? cursor.toString() : undefined 
      });
    }
  },
});
export const processBatch = internalMutation({
  args: { 
    batch: v.array(v.object({ id: v.string() /* add other fields here */ })),
  },
  handler: async (ctx, {batch}) => {

    // Process each record in the batch
    await Promise.all(batch.map(async (record) => {
      // Check if record already exists
      const existing = await ctx.db
        .query("yourTableName")
        .withIndex("by_id", q => q.eq("id", record.id))
        .unique();

      // Insert only if it doesn't exist
      if (!existing) {
        await ctx.db.insert("yourTableName", record);
      }
    }));
  },
});

export const insertMissingRecords = internalAction({
  args: { records: v.array(v.object({ id: v.string() /* add other fields here */ })) },
  handler: async (ctx, args) => {
    const { records } = args;

    const batchSize = 100;
    for (let i = 0; i < records.length; i += batchSize) {
      const batch = records.slice(i, i + batchSize);

      const cursor = i + batchSize < records.length ? i + batchSize : undefined;

      await ctx.runMutation(internal.yourModule.processBatch, { 
        batch, 
        cursor: cursor !== undefined ? cursor.toString() : undefined 
      });
    }
  },
});
export const processBatch = internalMutation({
  args: { 
    batch: v.array(v.object({ id: v.string() /* add other fields here */ })),
  },
  handler: async (ctx, {batch}) => {

    // Process each record in the batch
    await Promise.all(batch.map(async (record) => {
      // Check if record already exists
      const existing = await ctx.db
        .query("yourTableName")
        .withIndex("by_id", q => q.eq("id", record.id))
        .unique();

      // Insert only if it doesn't exist
      if (!existing) {
        await ctx.db.insert("yourTableName", record);
      }
    }));
  },
});

I just asked kapa about this, so this is just from kapa. looks about right! ^ actually, that's too complicated. it can be simpler let me simplify it. silly ai just loop over records and insert missing ones keeps the transactions small so you're not taking massive database locks and slowing your app down

NicOP•5mo ago

ill look over it and revisit later, fresh eyes might help me here thanks for the discussion! learned quite a bit

jamwt•5mo ago

no problem!

Really wish the database layer was a bit

Did you find this page helpful?