CharstarAdmin•2y ago

Implementing a virtual waiting room

I'm looking to implement a virtual waiting room to gate access to my product (I estimate I'll have roughly 2000+ users waiting concurrently). Users (each has a unique Clerk user ID) should see their position in the queue and then eventually be added in FIFO order to the main room when there is space (room has a fixed capacity). They'll get booted from the main room back to the queue if they are inactive for 5 minutes. I was thinking I could use a mutation to add users when they open the website and use the ‘by_creation’ index but not sure how to 1) efficiently query their position in line and 2) boot inactive users (e.g. if they leave the site). Perhaps I could have each client "listen" to the front of the queue (i.e. order('asc').first()) and respond accordingly if their ID matches it. If I have a global counter, I can add a field to the schema that stores the counter value at time of insertion with each user in the queue. Then, the difference between the user's value and the front of the queue's value is the user's position in line. Not sure how to drop users from the queue if they leave the site. Maybe I keep a heartbeat timestamp for each user in the queue? I noticed there was some discussion about sendBeacon and presence earlier. Anyways, do you think Convex is the right tool for this use case? Do you have a simpler proposal? I'm using Vercel/NextJS with API routes + Planetscale + Clerk right now. I will also need Expo/RN support.

71 Replies

erquhart•2y ago

Note: I don’t work for Convex, so stay tuned as they’ll likely follow up with an answer as well. 1. For position (Eg., "there are 400 users ahead of you"), I would make a query that fetches the waiting room table and sorts just like you're planning, but I'd index the query on user id (as you'll need _creationTime last for ordering) and use a range to limit to users with a creation time less than the user's. The query can then return the length of the array as the position in the queue. No extra work for "listening" as Convex queries are reactive by default. 2. For booting a user I'd agree it seems the info you found on presence is the way to go. When you detect a user is sufficiently "inactive", you can run a mutation to move them to the queue. On your larger question of whether Convex is right for this, I'd say absolutely. Honestly, having worked with Convex a few months now, I wouldn't be interested in implementing a project like that in any other system.

CharstarAdminOP•2y ago

Thanks for your help. On (1), will that incur more “reads” since it’s listening on a larger set of results? Also, I’m not sure about having a composite index on (user ID, creation timestamp) since that will only order timestamps within the same ID, at least that’s what the docs suggest.

erquhart•2y ago

What I described in step 1 can be done with a single db read. Good point on the index, I was thinking you could use userId as a placeholder but you're right, you'd have to specify an equality comparison so that won't work. The idea was to limit the result set as much as possible using a range on the index - any field that you can make work for the equality comparison will do. Of course, this will still be slow for folks at the end of the line and fast for folks at the front, but it's better than slow for all.

jamwt•2y ago

@CharstarAdmin hey! tuning this to any degree is possible, so it depends on how much you want to optimize it. so I can talk through a quite optimal version, but feel free to do something simpler you'd probably want a sequence number (incrementing) on your queue then an index on this number, and the user's id. two separate indexes you can always get the user in constant time and their queue position can be calculated by grabbing the maximum (or minimum) record from the sequence number index and comparing it to their sequence number this will give you two very fast point queries that will tell you your position in line if you want to delete people from the queue table (or whatever), that works. if not, you'd probably have some sort of "served" boolean that indicates if the queued person has been dequeued or not then your seq number index could be like on two fields: [served, position] where position is the sequence number then you can always quickly subdivide the "not yet served" portion of the queue by starting your index lookup with served = false or whatever lmk if this makes sense. but this approach would stay very very fast and scale up to queues of nearly any length

CharstarAdminOP•2y ago

Hello!

jamwt•2y ago

if your queues was 5-10 people, I'd say it doesn't really matter and you can just grab the list and get the .length; once you're talking about thousands, you'll probably find your better served by using the index (and sequence number)

CharstarAdminOP•2y ago

To answer your question, it's totally reasonable to delete users from the queue table once their admitted, so can avoid needing an extra served flag.

jamwt•2y ago

👍

CharstarAdminOP•2y ago

When you say "and their queue position can be calculated by grabbing the maximum (or minimum) record from the sequence number index and comparing it to their sequence number" -- this sounds similar to the order('asc').first() on the by_creation_time index, right? Not sure why it needs a sequence number.

jamwt•2y ago

the sequence number lets you know how many are in between without fetching them but otherwise, yes. that's the right idea const head = await db.query('queue').withIndex('by_position').order('asc').first() position in line = head.position - me.position or whatever

CharstarAdminOP•2y ago

I see, and would this avoid the global counter?

jamwt•2y ago

yeah, when you're inserting someone into the queue, you grab the person last in the queue and then insert the new person with that position + 1 and convex's transactionality will make this work 100% okay

CharstarAdminOP•2y ago

Awesome, honestly very excited to try this out. Cloudflare's waiting room is way too expensive.

jamwt•2y ago

cool! yeah, should be pretty easy to build. feel free to share out any code or anything if you have questions and we can take a look

CharstarAdminOP•2y ago

I think this would also make a very good template for Convex developers.

jamwt•2y ago

good point

erquhart•2y ago

This is brilliant. Would it require that users not be dropped mid-queue? Wouldn't that throw the math?

jamwt•2y ago

ah, if you want to evict people in the middle of the queue it does get more complicated. I'd say then you have a kind of product decision to make

erquhart•2y ago

I'd leave it and let the number be potentially inflated personally, not necessarily a showstopper

jamwt•2y ago

if it's going to be uncommon, I'd have it just be a pleasant surprise. if it's going to be common, then yeah, there's a need to recalculate the aggregate unless you do fancier things with data structures yeah, I'd leave it inflated personally

CharstarAdminOP•2y ago

Removing inactive users from the queue is ideal behavior so that other users aren't blocked.

jamwt•2y ago

otherwise you need to roll up aggregate data structures and if you want it to be sublinear you need to start using trees or something @CharstarAdmin you can still remove them, just don't worry about the queue position jumping forward two slots every now and then

CharstarAdminOP•2y ago

Good point! The queue position shown to the user doesn't need to decrement one-by-one.

jamwt•2y ago

yep

CharstarAdminOP•2y ago

Very helpful. And lastly, do you think the best way to admit a client into the main product is to react when they are at front of the queue + room has space? Will a .length on the room table trigger a full scan?

jamwt•2y ago

yeah, I'd just use a parent component or something that gates until the useQuery says they're allowed in. and swaps out from the waiting UI to the real UI the .length technically happens in javascript, so whatever records are pulled are already pulled. the number of records convex tracks for detecting changes is determined by which indexes you use, it's the ranges you specify in your index predicates. which is why a query that just uses two indexed queries where it fetches one record (me and the list head) is really, really efficient

CharstarAdminOP•2y ago

Got it. To track the room size (only 1000 admits can be in the product), I should have a separate counter?

jamwt•2y ago

if you don't use an index (let's say, you only use .filter and don't use .withIndex), any changes to any record in the table will cause that query to re-run Yeah, I'd recommend it

CharstarAdminOP•2y ago

Sounds good! I'll whip up a minimal prototype and share soon. Thanks so much. One more thing that wasn't super clear from the docs -- if I deploy my existing Vercel/NextJS website with Convex added, will my API routes stay on Vercel?

jamwt•2y ago

do you mean the lambda handlers under the api/ folder?

CharstarAdminOP•2y ago

Yep

jamwt•2y ago

you can if you'd like, but most convex users eventually replace them with convex actions instead so they have easy access to the full power of their convex data https://docs.convex.dev/tutorial/actions

CharstarAdminOP•2y ago

I see. I'd like to limit my Convex use to just the waiting queue and keep everything else as is.

jamwt•2y ago

convex actions are our equivalent of vercels functions okay, then your vercel functions will still work fine

CharstarAdminOP•2y ago

Great, thanks!

CharstarAdminOP•2y ago

As promised, here's the working Gist! https://gist.github.com/charstar-ai/dd12c74a2c6d1b48239f9bd9cb4639c3

Gist

Convex Waiting Queue & Room

Convex Waiting Queue & Room. GitHub Gist: instantly share code, notes, and snippets.

CharstarAdminOP•2y ago

There's a queue table, room table, and roomSize table (which just has the room size document). Users who are promoted from the queue to the room are put on a "short fuse", which means they have to send their first heartbeat within 5 minutes of being promoted to prove their presence. I added this to quickly prune out users who were idling in the queue and ended up in the room (5 minutes is too generous in hindsight). After their first heartbeat, the short fuse is disabled and the heartbeat window is extended to 10 minutes. There are two cron jobs pruning short fuse and long fuse users in the room.

CharstarAdminOP•2y ago

@jamwt , I had a question on this line: https://gist.github.com/charstar-ai/dd12c74a2c6d1b48239f9bd9cb4639c3#file-queue-ts-L28 Will replacing roomSize with a boolean hasSpace make things less reactive and therefore more efficient? Also, is it more efficient to move the roomSize into its own query function and check if there is space in the room separately from getPosition (e.g. useEffect(() => {/* call promoteToRoom(userId) if position is 0 and roomSize < MAX_ROOM_SIZE */}, [position, roomSize])? Is there a way to have getPosition react to changes in roomSize only if the user is the head of the queue or is that an over-optimization due to caching?

Gist

Convex Waiting Queue & Room

Convex Waiting Queue & Room. GitHub Gist: instantly share code, notes, and snippets.

CharstarAdminOP•2y ago

@jamwt sorry for the ping -- just wanted to get your thoughts on this ^

jamwt•2y ago

gotcha. hey! looking now re: roomSize if you can avoid a mutation on a value, then you will not invalidate things that depend on the value. so yeah, in this case, if you "prematerialize" hasSpace at mutation time rather than just changing the count and checking the predicate at query time, you'll avoid waking up a query just to compute that. however, due to caching, as long as everyone is waiting on the same query, this other query will only run once and everyone else will consume the cached result. so it shouldn't matter much in this case... you're passing the user id in as an arg so the query is not fully shared, since it's parameterized by the user id so I'm following you now. could you break out that one specific lookup into its own query? then you could use that cached value so this outer query doesn't have to do the database fetch okay, considering this whole query

CharstarAdminOP•2y ago

Got it, so I shouldn't have any mention of roomSize in getPosition?

jamwt•2y ago

I think the "good news" is you can do this any way you want b/c it doesn't have a performance difference in practice. this query will get invalidated by line 22 basically, when there is a new entry added so optimizing roomSize vs. hasSpace probably don't matter since this query will rerun anyway to calculate the new position breaking that one call out into a subquery won't really matter too much either. a single point query like that is like 1ms and probably already cached, so it likely won't matter. I think this is written just fine as is based on my cursory reading here

CharstarAdminOP•2y ago

    const head = await ctx.db
      .query('queue')
      .withIndex('by_position')
      .order('asc')
      .first()

    const head = await ctx.db
      .query('queue')
      .withIndex('by_position')
      .order('asc')
      .first()

Will this invalidate the query after any addition even if head doesn't change?

jamwt•2y ago

this is a question that @sujayakar should probably answer. it's subtle enough about withIndex not using a range specification, it might mean the dependent range is effectively the whole table (even though you're only consuming the edge-most record)

CharstarAdminOP•2y ago

I see. Either way, it would be good to know. Thanks for sanity-checking the rest.

sujayakar•2y ago

only the range used will be in the read set. so, if you get the single row at position p, the interval used is negative infinity to p (inclusive). so patterns like this are quite fast in convex! we'll definitely be improving observability for this stuff in the future. it should be easy to debug why a particular query is reexecuting all the time and what read dependencies it's taking without having to ask us 🙂

CharstarAdminOP•2y ago

Woot! Just to clarify, what you're referring to is a lookup by key (position) while here we have a lookup by relative order. @sujayakar, does your assertion still apply? Just want to make sure.

jamwt•2y ago

suj just clarified for me, yep

CharstarAdminOP•2y ago

Are you saying that if the result is head = p*, then the read set is [-1, p*]?

jamwt•2y ago

it uses how much of the cursor is consumed so if you stop at the first record, the range is from infinity (in this case, because you didn't specific a bounds to withIndex) to the indexed value at that record's position and this will do the right thing

CharstarAdminOP•2y ago

Very cool, thanks!

sujayakar•2y ago

oh sorry, yeah, I meant that p was the first document we encountered when walking the index in position order.

CharstarAdminOP•2y ago

@sujayakar @jamwt I went live in production and am seeing this error pop up occasionally for some users. From reading docs, it sounds like Convex will automatically retry this?

No description

jamwt•2y ago

yeah, @presley maybe you can take a look at this one? @CharstarAdmin we'll dig in and see what's going on. how often does this happen?

CharstarAdminOP•2y ago

I'm actually getting a variety of different errors.

No description

CharstarAdminOP•2y ago

No description

presley•2y ago

What is your instance name?

jamwt•2y ago

is it judicious-bear-548 ?

CharstarAdminOP•2y ago

https://affable-ibex-515.convex.cloud

jamwt•2y ago

ah

presley•2y ago

So seems like there are two issues: 1. Optimistic concurrency control failure: queue:AddToQueue 2. Function timeouts. queue:GetStatusById

CharstarAdminOP•2y ago

I also see this on the frontend side occasionally.

No description

CharstarAdminOP•2y ago

I haven't been able to diagnose that one though.

jamwt•2y ago

do you still have this codebase public @CharstarAdmin ? so we can take a look at it while we check things on out on the backend?

CharstarAdminOP•2y ago

Unfortunately, it's private but I'm happy to share temporary access.

jamwt•2y ago

sg. we'll see if it's necessary thanks

CharstarAdminOP•2y ago

I can share all the /convex code.

jamwt•2y ago

yup, that's the only part we care about. and we can set this up in a private channel if you'd prefer

CharstarAdminOP•2y ago

Sure, that would be great.

jamwt•2y ago

follow-up here for posterity: @CharstarAdmin needed to be provisioned onto a Convex Pro level of resources (our provisioning system was a bit behind on migrating him), we needed to tune various properties of his deployment to keep up with his traffic level, and further optimize the waiting room implementation. congrats on having such a successful project!

CharstarAdminOP•2y ago

Thanks Convex team!

Convex Community Join

Join the Convex Discord! Explore Convex, the fullstack TypeScript platform for developers and startup founders.

8.1KMembers

View on Discord

Did you find this page helpful?