CharstarAdmin
CharstarAdmin2y ago

Implementing a virtual waiting room

I'm looking to implement a virtual waiting room to gate access to my product (I estimate I'll have roughly 2000+ users waiting concurrently). Users (each has a unique Clerk user ID) should see their position in the queue and then eventually be added in FIFO order to the main room when there is space (room has a fixed capacity). They'll get booted from the main room back to the queue if they are inactive for 5 minutes. I was thinking I could use a mutation to add users when they open the website and use the ‘by_creation’ index but not sure how to 1) efficiently query their position in line and 2) boot inactive users (e.g. if they leave the site). Perhaps I could have each client "listen" to the front of the queue (i.e. order('asc').first()) and respond accordingly if their ID matches it. If I have a global counter, I can add a field to the schema that stores the counter value at time of insertion with each user in the queue. Then, the difference between the user's value and the front of the queue's value is the user's position in line. Not sure how to drop users from the queue if they leave the site. Maybe I keep a heartbeat timestamp for each user in the queue? I noticed there was some discussion about sendBeacon and presence earlier. Anyways, do you think Convex is the right tool for this use case? Do you have a simpler proposal? I'm using Vercel/NextJS with API routes + Planetscale + Clerk right now. I will also need Expo/RN support.
71 Replies
erquhart
erquhart2y ago
Note: I don’t work for Convex, so stay tuned as they’ll likely follow up with an answer as well. 1. For position (Eg., "there are 400 users ahead of you"), I would make a query that fetches the waiting room table and sorts just like you're planning, but I'd index the query on user id (as you'll need _creationTime last for ordering) and use a range to limit to users with a creation time less than the user's. The query can then return the length of the array as the position in the queue. No extra work for "listening" as Convex queries are reactive by default. 2. For booting a user I'd agree it seems the info you found on presence is the way to go. When you detect a user is sufficiently "inactive", you can run a mutation to move them to the queue. On your larger question of whether Convex is right for this, I'd say absolutely. Honestly, having worked with Convex a few months now, I wouldn't be interested in implementing a project like that in any other system.
CharstarAdmin
CharstarAdminOP2y ago
Thanks for your help. On (1), will that incur more “reads” since it’s listening on a larger set of results? Also, I’m not sure about having a composite index on (user ID, creation timestamp) since that will only order timestamps within the same ID, at least that’s what the docs suggest.
erquhart
erquhart2y ago
What I described in step 1 can be done with a single db read. Good point on the index, I was thinking you could use userId as a placeholder but you're right, you'd have to specify an equality comparison so that won't work. The idea was to limit the result set as much as possible using a range on the index - any field that you can make work for the equality comparison will do. Of course, this will still be slow for folks at the end of the line and fast for folks at the front, but it's better than slow for all.
jamwt
jamwt2y ago
@CharstarAdmin hey! tuning this to any degree is possible, so it depends on how much you want to optimize it. so I can talk through a quite optimal version, but feel free to do something simpler you'd probably want a sequence number (incrementing) on your queue then an index on this number, and the user's id. two separate indexes you can always get the user in constant time and their queue position can be calculated by grabbing the maximum (or minimum) record from the sequence number index and comparing it to their sequence number this will give you two very fast point queries that will tell you your position in line if you want to delete people from the queue table (or whatever), that works. if not, you'd probably have some sort of "served" boolean that indicates if the queued person has been dequeued or not then your seq number index could be like on two fields: [served, position] where position is the sequence number then you can always quickly subdivide the "not yet served" portion of the queue by starting your index lookup with served = false or whatever lmk if this makes sense. but this approach would stay very very fast and scale up to queues of nearly any length
CharstarAdmin
CharstarAdminOP2y ago
Hello!
jamwt
jamwt2y ago
if your queues was 5-10 people, I'd say it doesn't really matter and you can just grab the list and get the .length; once you're talking about thousands, you'll probably find your better served by using the index (and sequence number)
CharstarAdmin
CharstarAdminOP2y ago
To answer your question, it's totally reasonable to delete users from the queue table once their admitted, so can avoid needing an extra served flag.
jamwt
jamwt2y ago
👍
CharstarAdmin
CharstarAdminOP2y ago
When you say "and their queue position can be calculated by grabbing the maximum (or minimum) record from the sequence number index and comparing it to their sequence number" -- this sounds similar to the order('asc').first() on the by_creation_time index, right? Not sure why it needs a sequence number.
jamwt
jamwt2y ago
the sequence number lets you know how many are in between without fetching them but otherwise, yes. that's the right idea const head = await db.query('queue').withIndex('by_position').order('asc').first() position in line = head.position - me.position or whatever
CharstarAdmin
CharstarAdminOP2y ago
I see, and would this avoid the global counter?
jamwt
jamwt2y ago
yeah, when you're inserting someone into the queue, you grab the person last in the queue and then insert the new person with that position + 1 and convex's transactionality will make this work 100% okay
CharstarAdmin
CharstarAdminOP2y ago
Awesome, honestly very excited to try this out. Cloudflare's waiting room is way too expensive.
jamwt
jamwt2y ago
cool! yeah, should be pretty easy to build. feel free to share out any code or anything if you have questions and we can take a look
CharstarAdmin
CharstarAdminOP2y ago
I think this would also make a very good template for Convex developers.
jamwt
jamwt2y ago
good point
erquhart
erquhart2y ago
This is brilliant. Would it require that users not be dropped mid-queue? Wouldn't that throw the math?
jamwt
jamwt2y ago
ah, if you want to evict people in the middle of the queue it does get more complicated. I'd say then you have a kind of product decision to make
erquhart
erquhart2y ago
I'd leave it and let the number be potentially inflated personally, not necessarily a showstopper
jamwt
jamwt2y ago
if it's going to be uncommon, I'd have it just be a pleasant surprise. if it's going to be common, then yeah, there's a need to recalculate the aggregate unless you do fancier things with data structures yeah, I'd leave it inflated personally
CharstarAdmin
CharstarAdminOP2y ago
Removing inactive users from the queue is ideal behavior so that other users aren't blocked.
jamwt
jamwt2y ago
otherwise you need to roll up aggregate data structures and if you want it to be sublinear you need to start using trees or something @CharstarAdmin you can still remove them, just don't worry about the queue position jumping forward two slots every now and then
CharstarAdmin
CharstarAdminOP2y ago
Good point! The queue position shown to the user doesn't need to decrement one-by-one.
jamwt
jamwt2y ago
yep
CharstarAdmin
CharstarAdminOP2y ago
Very helpful. And lastly, do you think the best way to admit a client into the main product is to react when they are at front of the queue + room has space? Will a .length on the room table trigger a full scan?
jamwt
jamwt2y ago
yeah, I'd just use a parent component or something that gates until the useQuery says they're allowed in. and swaps out from the waiting UI to the real UI the .length technically happens in javascript, so whatever records are pulled are already pulled. the number of records convex tracks for detecting changes is determined by which indexes you use, it's the ranges you specify in your index predicates. which is why a query that just uses two indexed queries where it fetches one record (me and the list head) is really, really efficient
CharstarAdmin
CharstarAdminOP2y ago
Got it. To track the room size (only 1000 admits can be in the product), I should have a separate counter?
jamwt
jamwt2y ago
if you don't use an index (let's say, you only use .filter and don't use .withIndex), any changes to any record in the table will cause that query to re-run Yeah, I'd recommend it
CharstarAdmin
CharstarAdminOP2y ago
Sounds good! I'll whip up a minimal prototype and share soon. Thanks so much. One more thing that wasn't super clear from the docs -- if I deploy my existing Vercel/NextJS website with Convex added, will my API routes stay on Vercel?
jamwt
jamwt2y ago
do you mean the lambda handlers under the api/ folder?
CharstarAdmin
CharstarAdminOP2y ago
Yep
jamwt
jamwt2y ago
you can if you'd like, but most convex users eventually replace them with convex actions instead so they have easy access to the full power of their convex data https://docs.convex.dev/tutorial/actions
CharstarAdmin
CharstarAdminOP2y ago
I see. I'd like to limit my Convex use to just the waiting queue and keep everything else as is.
jamwt
jamwt2y ago
convex actions are our equivalent of vercels functions okay, then your vercel functions will still work fine
CharstarAdmin
CharstarAdminOP2y ago
Great, thanks!
CharstarAdmin
CharstarAdminOP2y ago
Gist
Convex Waiting Queue & Room
Convex Waiting Queue & Room. GitHub Gist: instantly share code, notes, and snippets.
CharstarAdmin
CharstarAdminOP2y ago
There's a queue table, room table, and roomSize table (which just has the room size document). Users who are promoted from the queue to the room are put on a "short fuse", which means they have to send their first heartbeat within 5 minutes of being promoted to prove their presence. I added this to quickly prune out users who were idling in the queue and ended up in the room (5 minutes is too generous in hindsight). After their first heartbeat, the short fuse is disabled and the heartbeat window is extended to 10 minutes. There are two cron jobs pruning short fuse and long fuse users in the room.
CharstarAdmin
CharstarAdminOP2y ago
@jamwt , I had a question on this line: https://gist.github.com/charstar-ai/dd12c74a2c6d1b48239f9bd9cb4639c3#file-queue-ts-L28 Will replacing roomSize with a boolean hasSpace make things less reactive and therefore more efficient? Also, is it more efficient to move the roomSize into its own query function and check if there is space in the room separately from getPosition (e.g. useEffect(() => {/* call promoteToRoom(userId) if position is 0 and roomSize < MAX_ROOM_SIZE */}, [position, roomSize])? Is there a way to have getPosition react to changes in roomSize only if the user is the head of the queue or is that an over-optimization due to caching?
Gist
Convex Waiting Queue & Room
Convex Waiting Queue & Room. GitHub Gist: instantly share code, notes, and snippets.
CharstarAdmin
CharstarAdminOP2y ago
@jamwt sorry for the ping -- just wanted to get your thoughts on this ^
jamwt
jamwt2y ago
gotcha. hey! looking now re: roomSize if you can avoid a mutation on a value, then you will not invalidate things that depend on the value. so yeah, in this case, if you "prematerialize" hasSpace at mutation time rather than just changing the count and checking the predicate at query time, you'll avoid waking up a query just to compute that. however, due to caching, as long as everyone is waiting on the same query, this other query will only run once and everyone else will consume the cached result. so it shouldn't matter much in this case... you're passing the user id in as an arg so the query is not fully shared, since it's parameterized by the user id so I'm following you now. could you break out that one specific lookup into its own query? then you could use that cached value so this outer query doesn't have to do the database fetch okay, considering this whole query
CharstarAdmin
CharstarAdminOP2y ago
Got it, so I shouldn't have any mention of roomSize in getPosition?
jamwt
jamwt2y ago
I think the "good news" is you can do this any way you want b/c it doesn't have a performance difference in practice. this query will get invalidated by line 22 basically, when there is a new entry added so optimizing roomSize vs. hasSpace probably don't matter since this query will rerun anyway to calculate the new position breaking that one call out into a subquery won't really matter too much either. a single point query like that is like 1ms and probably already cached, so it likely won't matter. I think this is written just fine as is based on my cursory reading here
CharstarAdmin
CharstarAdminOP2y ago
const head = await ctx.db
.query('queue')
.withIndex('by_position')
.order('asc')
.first()
const head = await ctx.db
.query('queue')
.withIndex('by_position')
.order('asc')
.first()
Will this invalidate the query after any addition even if head doesn't change?
jamwt
jamwt2y ago
this is a question that @sujayakar should probably answer. it's subtle enough about withIndex not using a range specification, it might mean the dependent range is effectively the whole table (even though you're only consuming the edge-most record)
CharstarAdmin
CharstarAdminOP2y ago
I see. Either way, it would be good to know. Thanks for sanity-checking the rest.
sujayakar
sujayakar2y ago
only the range used will be in the read set. so, if you get the single row at position p, the interval used is negative infinity to p (inclusive). so patterns like this are quite fast in convex! we'll definitely be improving observability for this stuff in the future. it should be easy to debug why a particular query is reexecuting all the time and what read dependencies it's taking without having to ask us 🙂
CharstarAdmin
CharstarAdminOP2y ago
Woot! Just to clarify, what you're referring to is a lookup by key (position) while here we have a lookup by relative order. @sujayakar, does your assertion still apply? Just want to make sure.
jamwt
jamwt2y ago
suj just clarified for me, yep
CharstarAdmin
CharstarAdminOP2y ago
Are you saying that if the result is head = p*, then the read set is [-1, p*]?
jamwt
jamwt2y ago
it uses how much of the cursor is consumed so if you stop at the first record, the range is from infinity (in this case, because you didn't specific a bounds to withIndex) to the indexed value at that record's position and this will do the right thing
CharstarAdmin
CharstarAdminOP2y ago
Very cool, thanks!
sujayakar
sujayakar2y ago
oh sorry, yeah, I meant that p was the first document we encountered when walking the index in position order.
CharstarAdmin
CharstarAdminOP2y ago
@sujayakar @jamwt I went live in production and am seeing this error pop up occasionally for some users. From reading docs, it sounds like Convex will automatically retry this?
No description
jamwt
jamwt2y ago
yeah, @presley maybe you can take a look at this one? @CharstarAdmin we'll dig in and see what's going on. how often does this happen?
CharstarAdmin
CharstarAdminOP2y ago
I'm actually getting a variety of different errors.
No description
CharstarAdmin
CharstarAdminOP2y ago
No description
presley
presley2y ago
What is your instance name?
jamwt
jamwt2y ago
is it judicious-bear-548 ?
jamwt
jamwt2y ago
ah
presley
presley2y ago
So seems like there are two issues: 1. Optimistic concurrency control failure: queue:AddToQueue 2. Function timeouts. queue:GetStatusById
CharstarAdmin
CharstarAdminOP2y ago
I also see this on the frontend side occasionally.
No description
CharstarAdmin
CharstarAdminOP2y ago
I haven't been able to diagnose that one though.
jamwt
jamwt2y ago
do you still have this codebase public @CharstarAdmin ? so we can take a look at it while we check things on out on the backend?
CharstarAdmin
CharstarAdminOP2y ago
Unfortunately, it's private but I'm happy to share temporary access.
jamwt
jamwt2y ago
sg. we'll see if it's necessary thanks
CharstarAdmin
CharstarAdminOP2y ago
I can share all the /convex code.
jamwt
jamwt2y ago
yup, that's the only part we care about. and we can set this up in a private channel if you'd prefer
CharstarAdmin
CharstarAdminOP2y ago
Sure, that would be great.
jamwt
jamwt2y ago
follow-up here for posterity: @CharstarAdmin needed to be provisioned onto a Convex Pro level of resources (our provisioning system was a bit behind on migrating him), we needed to tune various properties of his deployment to keep up with his traffic level, and further optimize the waiting room implementation. congrats on having such a successful project!
CharstarAdmin
CharstarAdminOP2y ago
Thanks Convex team!

Did you find this page helpful?