Table size
In Convex is there a way to get the number of rows without having to query it? I am trying to select a random row in one of my tables the database
13 Replies
hi! can you describe more about how getting the number of rows would help select a random row? If you don't mind using an interface that might change (isn't stable) and ignoring the typescript error, you could use
await db.query('table').count()
.I'm using a Math.random() to generate a random integer between 0 and the length of the table
we haven't invested in this
count
interface yet, so having more context on its use-cases will help prioritizeI'm trying to select a random row in a table of texts to get a single column. My algorithm is:
1) Query texts and collect
2) Get length
3) Generate random index
4) Return that row
So I'd need to query anyway. I'm just wondering if count() would be more optimized than collect().length
if you've already queried and collected, can you get the length from that array?
count()
is constant time, while collect().length
is linear. but count()
is restricted to only work on entire tablesSo it would be linear either way
yeah, for now @Night, I'd recommend just
.collect()
ing into an array, since convex's queries don't support offset, so even if you efficiently generated the random index from (3), jumping to the k
th record would still be O(k)
time.
there's a trick we support for selecting a random record in logarithmic time, but I'll have to write it up, and I think it relies on internal APIs 😉the trick is almost possible with existing APIs. I'm imagining a
db.query('table').withIndex('by_id', q => q.gt('_id', new Id('table', <randomly generated bytes>)).first()
but generating the random bytes that make the query happy might be tricky. interesting thought exercise 😄Another idea is to have a field in your documents that is an integer id. Every time you add one, you find the row with the highest id and add your document with that + 1. Because of Convex's serializable isolation, this is safe. You could have an index on this new field, to make finding the largest one fast. Then that row also has the count by default, and you could generate a random int at or below it to fetch a random row. Note: this assumes you don't delete rows
Thanks for all these possible methods! I'll probably stick with collect for now and if there's a performance problem I'll try out that indexing strategy because it's very unlikely I'd ever need to delete the rows in this table.
What exactly is offsetting?
Essentially querying a table by an numerical counter, like an array index - generically it's an offset from some location within an index.
that's my understanding, anyways. i'm no db expert though
When you create an index on a column it doesn't scan the entire table though, right?
Using withIndex()
That's correct,
withIndex
doesn't scan the entire table, just the part of the table specified in the index filter