Getting random results back when querying a table, specifically using a paginated query?
Hello,
I'm facing challenges while attempting to retrieve data at random from a table.
The current process, in this example a paginated query - delivers results in the order the data was added to the table, which does not meet our needs.
A bit of context, our content managers tend to upload videos relevant to a single category (for example, food) at a time. Consequently, fetching data in order would result in a high concentration of videos from the same category – all 60 food videos might be returned in one query. This is mostly right now, as our content managers are bulk uploading a bunch of content.
We'd love to show a big mix of videos - totally random - every time we pull data, instead of just getting a bunch from one category. Sadly this limitation greatly affects a important part of our app.
Anyone able to guide me in the right direction here, as i fail to see how this can be done with Convex.
4 Replies
Something like a
.random()
would be super nice.
Unfortunately a
.random()
as you describe is not possible to implement efficiently. It would need to pick a new ordering of the documents, which takes O(n) time and space (defeating the purpose of the pagination), and remember the ordering for subsequent pages but pick a new ordering when you start a new query.
This pattern might get close to what you want: you can pick a random ordering of documents when they are inserted, and always return them in that order.
Ahh yeah, i can see what you mean about it ruining the purpose of a pagination.
Thanks a lot for sharing Lee. That's a nice pattern, hadn't thought about solving the issue like that.
I've seen this discussed before here, there isn't a great solution. If I had to randomize a query for recommendations, I'd try something like:
1. adding an ordered numeric index field on the table, so each doc has 1, 2, 3, etc.
2. start the query with getting the highest index in the table
3. generate an array of unique random integers between 1 and max index, with a length of however many recommendations you want per page
4. concurrently map the array to unique or first queries, which would be quite fast
To approximate pagination, you just need to keep track of the numbers array for a given session - probably worth tracking in the db to avoid weirdness like passing the array back and forth in requests and responses.
I could see this getting sticky if you have a relatively small number of documents to randomize, but if you're using this approach on a decently sized table (maybe many thousands or more?), might work well.