Really wish the database layer was a bit
Really wish the database layer was a bit more intuitive/functional to use, simple things like count() etc. I was working on a project that had 20k rows in a table, just wanted to show the count of rows. You have to fetch the entire dataset then .length it? Well that causes it to yell at you saying that the query's too big, and somehow i used 16GB worth of database bandwidth (had to upgrade to clear the account lockdown because of exceeding limits).
20 Replies
Yeah there are some things that are usually batteries-included that don't "just work" in Convex. For count the best approach is to use the Aggregate component, so you can do:
select("column1", "column2") would have been great too, maybe im just misreading what convex databases are supposed to be
selecting certain columns would have cut down greatly on the database bandwidth used too
Not really - a sql select query is still taking whole rows, mapping down to the fields you selected, and then returning them to you.
Since your Convex functions are running "inside" the database, you're just doing what sql select does manually when you map
i guess that occurs on the database server (mysql for ex) not sent to the app to process?
right
and your Convex code is running in the db
interesting, but the table was iirc 200mb or so, just curious if you do a collect() on the whole table does that add 200mb to your bandwidth used
just no idea how i got to 15gb, i deleted the project already because i didnt want any ghost processed to continue to do anything
Number of rows scanned depends on index use
Oh you're talking about a specific op you did, missed that
So getting a count on the spot does involve getting all records, yeah. The aggregate component basically keeps metadata to avoid needing to do that.
my function was to grab a csv i imported, check the ids in the csv for duplicates in the database (so we dont add twice)
It initially has to go over to get a current count, but from there it keeps metadata up to date so it's not a big operation to get the count
maybe needed an index?
so normally i'd just select distinct of the id, of 20k items
index wouldnt do anything, since what im selecting would just return the whole database anyway
whole table, rather
so, do a collect() on table, pull the unique id thats also in the csv, process the csv to remove ones i already have saved, save the new results
so, why not just stream over the CSV and use an index to lookup if each record exists as you encounter them? especially once you have bigger datasets, you're not going to want to load the whole table
convex is a "fail fast" database, just b/c it's designed to run you up to big big scale. so it sets you up for using O(log n) operations and memory efficient access out of the gate
aggregates = use the aggregate component
batch operations = use indexes and individual records
wouldnt that just introduce 20k queries instead?
yeah we were initially talking count, aggregate for just that part
but doesn't sound like you actually need a count
wouldnt that just introduce 20k queries instead?yep. many small queries is happier in OLTP than big table scans
that just hurts my brain lol
maybe im just not visualizing how things are being processed at runtime
im using convex for another project, and it's working amazingly
might just not be the right fit for this one im building right now
I just asked kapa about this, so this is just from kapa. looks about right! ^
actually, that's too complicated. it can be simpler
let me simplify it. silly ai
just loop over records and insert missing ones
keeps the transactions small so you're not taking massive database locks and slowing your app down
ill look over it and revisit later, fresh eyes might help me here
thanks for the discussion! learned quite a bit
no problem!