Implementing a virtual waiting room
I'm looking to implement a virtual waiting room to gate access to my product (I estimate I'll have roughly 2000+ users waiting concurrently). Users (each has a unique Clerk user ID) should see their position in the queue and then eventually be added in FIFO order to the main room when there is space (room has a fixed capacity). They'll get booted from the main room back to the queue if they are inactive for 5 minutes.
I was thinking I could use a mutation to add users when they open the website and use the ‘by_creation’ index but not sure how to 1) efficiently query their position in line and 2) boot inactive users (e.g. if they leave the site).
Perhaps I could have each client "listen" to the front of the queue (i.e.
order('asc').first()
) and respond accordingly if their ID matches it. If I have a global counter, I can add a field to the schema that stores the counter value at time of insertion with each user in the queue. Then, the difference between the user's value and the front of the queue's value is the user's position in line.
Not sure how to drop users from the queue if they leave the site. Maybe I keep a heartbeat timestamp for each user in the queue? I noticed there was some discussion about sendBeacon and presence earlier.
Anyways, do you think Convex is the right tool for this use case? Do you have a simpler proposal?
I'm using Vercel/NextJS with API routes + Planetscale + Clerk right now. I will also need Expo/RN support.71 Replies
Note: I don’t work for Convex, so stay tuned as they’ll likely follow up with an answer as well.
1. For position (Eg., "there are 400 users ahead of you"), I would make a query that fetches the waiting room table and sorts just like you're planning, but I'd index the query on user id (as you'll need
_creationTime
last for ordering) and use a range to limit to users with a creation time less than the user's. The query can then return the length of the array as the position in the queue. No extra work for "listening" as Convex queries are reactive by default.
2. For booting a user I'd agree it seems the info you found on presence is the way to go. When you detect a user is sufficiently "inactive", you can run a mutation to move them to the queue.
On your larger question of whether Convex is right for this, I'd say absolutely. Honestly, having worked with Convex a few months now, I wouldn't be interested in implementing a project like that in any other system.Thanks for your help. On (1), will that incur more “reads” since it’s listening on a larger set of results? Also, I’m not sure about having a composite index on (user ID, creation timestamp) since that will only order timestamps within the same ID, at least that’s what the docs suggest.
What I described in step 1 can be done with a single db read. Good point on the index, I was thinking you could use userId as a placeholder but you're right, you'd have to specify an equality comparison so that won't work. The idea was to limit the result set as much as possible using a range on the index - any field that you can make work for the equality comparison will do. Of course, this will still be slow for folks at the end of the line and fast for folks at the front, but it's better than slow for all.
@CharstarAdmin hey! tuning this to any degree is possible, so it depends on how much you want to optimize it. so I can talk through a quite optimal version, but feel free to do something simpler
you'd probably want a sequence number (incrementing) on your queue
then an index on this number, and the user's id. two separate indexes
you can always get the user in constant time
and their queue position can be calculated by grabbing the maximum (or minimum) record from the sequence number index and comparing it to their sequence number
this will give you two very fast point queries that will tell you your position in line
if you want to delete people from the queue table (or whatever), that works. if not, you'd probably have some sort of "served" boolean that indicates if the queued person has been dequeued or not
then your seq number index could be like on two fields: [served, position] where
position
is the sequence number
then you can always quickly subdivide the "not yet served" portion of the queue by starting your index lookup with served = false
or whatever
lmk if this makes sense. but this approach would stay very very fast and scale up to queues of nearly any lengthHello!
if your queues was 5-10 people, I'd say it doesn't really matter and you can just grab the list and get the
.length
; once you're talking about thousands, you'll probably find your better served by using the index (and sequence number)To answer your question, it's totally reasonable to delete users from the queue table once their admitted, so can avoid needing an extra
served
flag.👍
When you say "and their queue position can be calculated by grabbing the maximum (or minimum) record from the sequence number index and comparing it to their sequence number" -- this sounds similar to the
order('asc').first()
on the by_creation_time
index, right? Not sure why it needs a sequence number.the sequence number lets you know how many are in between
without fetching them
but otherwise, yes. that's the right idea
const head = await db.query('queue').withIndex('by_position').order('asc').first()
position in line = head.position - me.position
or whateverI see, and would this avoid the global counter?
yeah, when you're inserting someone into the queue, you grab the person last in the queue and then insert the new person with that position + 1
and convex's transactionality will make this work 100% okay
Awesome, honestly very excited to try this out. Cloudflare's waiting room is way too expensive.
cool! yeah, should be pretty easy to build. feel free to share out any code or anything if you have questions and we can take a look
I think this would also make a very good template for Convex developers.
good point
This is brilliant. Would it require that users not be dropped mid-queue? Wouldn't that throw the math?
ah, if you want to evict people in the middle of the queue it does get more complicated. I'd say then you have a kind of product decision to make
I'd leave it and let the number be potentially inflated personally, not necessarily a showstopper
if it's going to be uncommon, I'd have it just be a pleasant surprise. if it's going to be common, then yeah, there's a need to recalculate the aggregate unless you do fancier things with data structures
yeah, I'd leave it inflated personally
Removing inactive users from the queue is ideal behavior so that other users aren't blocked.
otherwise you need to roll up aggregate data structures and if you want it to be sublinear you need to start using trees or something
@CharstarAdmin you can still remove them, just don't worry about the queue position jumping forward two slots every now and then
Good point!
The queue position shown to the user doesn't need to decrement one-by-one.
yep
Very helpful. And lastly, do you think the best way to admit a client into the main product is to react when they are at front of the queue + room has space?
Will a
.length
on the room table trigger a full scan?yeah, I'd just use a parent component or something that gates until the
useQuery
says they're allowed in. and swaps out from the waiting UI to the real UI
the .length technically happens in javascript, so whatever records are pulled are already pulled. the number of records convex tracks for detecting changes is determined by which indexes you use, it's the ranges you specify in your index predicates.
which is why a query that just uses two indexed queries where it fetches one record (me and the list head) is really, really efficientGot it. To track the room size (only 1000 admits can be in the product), I should have a separate counter?
if you don't use an index (let's say, you only use
.filter
and don't use .withIndex
), any changes to any record in the table will cause that query to re-run
Yeah, I'd recommend itSounds good! I'll whip up a minimal prototype and share soon. Thanks so much.
One more thing that wasn't super clear from the docs -- if I deploy my existing Vercel/NextJS website with Convex added, will my API routes stay on Vercel?
do you mean the lambda handlers under the
api/
folder?Yep
you can if you'd like, but most convex users eventually replace them with convex actions instead so they have easy access to the full power of their convex data
https://docs.convex.dev/tutorial/actions
I see. I'd like to limit my Convex use to just the waiting queue and keep everything else as is.
convex actions are our equivalent of vercels functions
okay, then your vercel functions will still work fine
Great, thanks!
As promised, here's the working Gist! https://gist.github.com/charstar-ai/dd12c74a2c6d1b48239f9bd9cb4639c3
Gist
Convex Waiting Queue & Room
Convex Waiting Queue & Room. GitHub Gist: instantly share code, notes, and snippets.
There's a
queue
table, room
table, and roomSize
table (which just has the room size document). Users who are promoted from the queue to the room are put on a "short fuse", which means they have to send their first heartbeat within 5 minutes of being promoted to prove their presence. I added this to quickly prune out users who were idling in the queue and ended up in the room (5 minutes is too generous in hindsight). After their first heartbeat, the short fuse is disabled and the heartbeat window is extended to 10 minutes. There are two cron jobs pruning short fuse and long fuse users in the room.@jamwt , I had a question on this line: https://gist.github.com/charstar-ai/dd12c74a2c6d1b48239f9bd9cb4639c3#file-queue-ts-L28
Will replacing
roomSize
with a boolean hasSpace
make things less reactive and therefore more efficient? Also, is it more efficient to move the roomSize
into its own query function and check if there is space in the room separately from getPosition
(e.g. useEffect(() => {/* call promoteToRoom(userId) if position is 0 and roomSize < MAX_ROOM_SIZE */}, [position, roomSize])
? Is there a way to have getPosition
react to changes in roomSize
only if the user is the head of the queue or is that an over-optimization due to caching?Gist
Convex Waiting Queue & Room
Convex Waiting Queue & Room. GitHub Gist: instantly share code, notes, and snippets.
@jamwt sorry for the ping -- just wanted to get your thoughts on this ^
gotcha. hey!
looking now
re: roomSize
if you can avoid a mutation on a value, then you will not invalidate things that depend on the value. so yeah, in this case, if you "prematerialize"
hasSpace
at mutation time rather than just changing the count and checking the predicate at query time, you'll avoid waking up a query just to compute that. however, due to caching, as long as everyone is waiting on the same query, this other query will only run once and everyone else will consume the cached result. so it shouldn't matter much
in this case...
you're passing the user id in as an arg
so the query is not fully shared, since it's parameterized by the user id
so I'm following you now. could you break out that one specific lookup into its own query? then you could use that cached value so this outer query doesn't have to do the database fetch
okay, considering this whole queryGot it, so I shouldn't have any mention of
roomSize
in getPosition
?I think the "good news" is you can do this any way you want b/c it doesn't have a performance difference in practice. this query will get invalidated by line 22
basically, when there is a new entry added
so optimizing roomSize vs. hasSpace probably don't matter since this query will rerun anyway to calculate the new position
breaking that one call out into a subquery won't really matter too much either. a single point query like that is like 1ms and probably already cached, so it likely won't matter. I think this is written just fine as is based on my cursory reading here
Will this invalidate the query after any addition even if
head
doesn't change?this is a question that @sujayakar should probably answer.
it's subtle enough about
withIndex
not using a range specification, it might mean the dependent range is effectively the whole table (even though you're only consuming the edge-most record)I see. Either way, it would be good to know. Thanks for sanity-checking the rest.
only the range used will be in the read set. so, if you get the single row at position
p
, the interval used is negative infinity to p
(inclusive). so patterns like this are quite fast in convex!
we'll definitely be improving observability for this stuff in the future. it should be easy to debug why a particular query is reexecuting all the time and what read dependencies it's taking without having to ask us 🙂Woot!
Just to clarify, what you're referring to is a lookup by key (
position
) while here we have a lookup by relative order.
@sujayakar, does your assertion still apply? Just want to make sure.suj just clarified for me, yep
Are you saying that if the result is
head = p*
, then the read set is [-1, p*]
?it uses how much of the cursor is consumed
so if you stop at the first record, the range is from infinity (in this case, because you didn't specific a bounds to
withIndex
) to the indexed value at that record's position
and this will do the right thingVery cool, thanks!
oh sorry, yeah, I meant that
p
was the first document we encountered when walking the index in position order.@sujayakar @jamwt I went live in production and am seeing this error pop up occasionally for some users. From reading docs, it sounds like Convex will automatically retry this?

yeah, @presley maybe you can take a look at this one?
@CharstarAdmin we'll dig in and see what's going on. how often does this happen?
I'm actually getting a variety of different errors.


What is your instance name?
is it judicious-bear-548 ?
ah
So seems like there are two issues:
1. Optimistic concurrency control failure: queue:AddToQueue
2. Function timeouts. queue:GetStatusById
I also see this on the frontend side occasionally.

I haven't been able to diagnose that one though.
do you still have this codebase public @CharstarAdmin ? so we can take a look at it while we check things on out on the backend?
Unfortunately, it's private but I'm happy to share temporary access.
sg. we'll see if it's necessary
thanks
I can share all the
/convex
code.yup, that's the only part we care about. and we can set this up in a private channel if you'd prefer
Sure, that would be great.
follow-up here for posterity: @CharstarAdmin needed to be provisioned onto a Convex Pro level of resources (our provisioning system was a bit behind on migrating him), we needed to tune various properties of his deployment to keep up with his traffic level, and further optimize the waiting room implementation. congrats on having such a successful project!
Thanks Convex team!