titin_sculpts_clay
titin_sculpts_clay•6mo ago

Sorting records randomly

So I'm prototyping a feed-type page for users to view posts from around the app, that loads content in a random order. It's not dated content, so serving sorted by timestamp doesn't make much sense. It also serves to add a degree of freshness, I think, to show random unviewed posts. I know traditional SQL has a SORT BY random implementation, but was wondering if Convex had something similar out of the box to implement this feature. Else I'm open to advice on how to pull off something like this.
6 Replies
sujayakar
sujayakar•6mo ago
hi! if the number of unviewed posts isn't that large (say, ~1000), you can just .collect() them into memory and shuffle them. shuffling with something like https://stackoverflow.com/a/46545530 would be similar to how a SQL database would execute a SORT BY random. you can also use an algorithm like https://stackoverflow.com/a/2450976 which would be more efficient than sorting by a random number.
erquhart
erquhart•6mo ago
I've seen posts about this from a few people over time and I haven't come up with a great solution for larger datasets. Adding a random method that works with pagination to the convex api would be ideal.
sujayakar
sujayakar•6mo ago
@lee has a btree library he's been working on that can support this at scale. in particular, it lets you (1) efficiently count the number of records and (2) access the ith element in logarithmic time. then, you can just sample a random index in [0, n) and then access that offset to get a uniform sample from the table. it's currently blocked on components but we'll get it out soon 🙂
erquhart
erquhart•6mo ago
oh that's great news, thanks for the update!
titin_sculpts_clay
titin_sculpts_clayOP•6mo ago
oki. Thanks for sharing What I'm doing now is a similar approach: saving the total in a metadata table and updating it for each upload + assigning each post an index At read time, I'll just - keep an array of already served posts this session - generate a rand number [0,n] not in that list, - check if corresponding post exists and other checks (I want to prioritize unviewed posts) - fetch it It works for now, but it's not the most extensible and it needs scaffolding — so I'll keep my eyes peeled for the paginatable rand query 😅
ian
ian•6mo ago
Here's a sketch I noodled on yesterday, if anyone wants a very over-engineered idea: - Buckets of limited capacity (100 say) which could be just a single document with a list of IDs. buckets have sequential numbers, and know how many items they have. - Top level document with the # of buckets, and a set of IDs to non-filled buckets to insert into - When you insert, you pick a random insert bucket. If it’s full, you make a new insert bucket & update metadata - When you fetch, you fetch from a random bucket by its number (knowing how many buckets there are), then from 0->100 randomly. If the bucket didn’t have enough items, you pick another random bucket, another 0->100. Maybe after n failed attempts you pick from a bucket known to be full. - Doesn't handle deletes gracefully - you'd need to maybe walk & compress sparse buckets occasionally. That way there isn’t a read dependency on multiple buckets in the normal case, you can shard inserts (high throughput w/o occ), and sampling is uniform

Did you find this page helpful?