Night•3y ago

Table size

In Convex is there a way to get the number of rows without having to query it? I am trying to select a random row in one of my tables the database

13 Replies

lee•3y ago

hi! can you describe more about how getting the number of rows would help select a random row? If you don't mind using an interface that might change (isn't stable) and ignoring the typescript error, you could use await db.query('table').count().

NightOP•3y ago

I'm using a Math.random() to generate a random integer between 0 and the length of the table

lee•3y ago

we haven't invested in this count interface yet, so having more context on its use-cases will help prioritize

NightOP•3y ago

I'm trying to select a random row in a table of texts to get a single column. My algorithm is: 1) Query texts and collect 2) Get length 3) Generate random index 4) Return that row So I'd need to query anyway. I'm just wondering if count() would be more optimized than collect().length

lee•3y ago

if you've already queried and collected, can you get the length from that array? count() is constant time, while collect().length is linear. but count() is restricted to only work on entire tables

NightOP•3y ago

So it would be linear either way

sujayakar•3y ago

yeah, for now @Night, I'd recommend just .collect()ing into an array, since convex's queries don't support offset, so even if you efficiently generated the random index from (3), jumping to the kth record would still be O(k) time. there's a trick we support for selecting a random record in logarithmic time, but I'll have to write it up, and I think it relies on internal APIs 😉

lee•3y ago

the trick is almost possible with existing APIs. I'm imagining a db.query('table').withIndex('by_id', q => q.gt('_id', new Id('table', <randomly generated bytes>)).first() but generating the random bytes that make the query happy might be tricky. interesting thought exercise 😄

ian•3y ago

Another idea is to have a field in your documents that is an integer id. Every time you add one, you find the row with the highest id and add your document with that + 1. Because of Convex's serializable isolation, this is safe. You could have an index on this new field, to make finding the largest one fast. Then that row also has the count by default, and you could generate a random int at or below it to fetch a random row. Note: this assumes you don't delete rows

NightOP•3y ago

Thanks for all these possible methods! I'll probably stick with collect for now and if there's a performance problem I'll try out that indexing strategy because it's very unlikely I'd ever need to delete the rows in this table. What exactly is offsetting?

ian•3y ago

Essentially querying a table by an numerical counter, like an array index - generically it's an offset from some location within an index. that's my understanding, anyways. i'm no db expert though

NightOP•3y ago

When you create an index on a column it doesn't scan the entire table though, right? Using withIndex()

lee•3y ago

That's correct, withIndex doesn't scan the entire table, just the part of the table specified in the index filter

Table size

Did you find this page helpful?