Sorting records randomly
So I'm prototyping a feed-type page for users to view posts from around the app, that loads content in a random order. It's not dated content, so serving sorted by timestamp doesn't make much sense. It also serves to add a degree of freshness, I think, to show random unviewed posts.
I know traditional SQL has a
SORT BY random
implementation, but was wondering if Convex had something similar out of the box to implement this feature.
Else I'm open to advice on how to pull off something like this.6 Replies
hi! if the number of unviewed posts isn't that large (say, ~1000), you can just
.collect()
them into memory and shuffle them. shuffling with something like https://stackoverflow.com/a/46545530 would be similar to how a SQL database would execute a SORT BY random
. you can also use an algorithm like https://stackoverflow.com/a/2450976 which would be more efficient than sorting by a random number.I've seen posts about this from a few people over time and I haven't come up with a great solution for larger datasets. Adding a random method that works with pagination to the convex api would be ideal.
@lee has a btree library he's been working on that can support this at scale. in particular, it lets you (1) efficiently count the number of records and (2) access the
i
th element in logarithmic time. then, you can just sample a random index in [0, n)
and then access that offset to get a uniform sample from the table.
it's currently blocked on components but we'll get it out soon 🙂oh that's great news, thanks for the update!
oki. Thanks for sharing
What I'm doing now is a similar approach: saving the total in a metadata table and updating it for each upload + assigning each post an index
At read time, I'll just
- keep an array of already served posts this session
- generate a rand number [0,n] not in that list,
- check if corresponding post exists and other checks (I want to prioritize unviewed posts)
- fetch it
It works for now, but it's not the most extensible and it needs scaffolding — so I'll keep my eyes peeled for the paginatable rand query 😅
Here's a sketch I noodled on yesterday, if anyone wants a very over-engineered idea:
- Buckets of limited capacity (100 say) which could be just a single document with a list of IDs. buckets have sequential numbers, and know how many items they have.
- Top level document with the # of buckets, and a set of IDs to non-filled buckets to insert into
- When you insert, you pick a random insert bucket. If it’s full, you make a new insert bucket & update metadata
- When you fetch, you fetch from a random bucket by its number (knowing how many buckets there are), then from 0->100 randomly. If the bucket didn’t have enough items, you pick another random bucket, another 0->100. Maybe after n failed attempts you pick from a bucket known to be full.
- Doesn't handle deletes gracefully - you'd need to maybe walk & compress sparse buckets occasionally.
That way there isn’t a read dependency on multiple buckets in the normal case, you can shard inserts (high throughput w/o occ), and sampling is uniform