RJ
RJ12mo ago

`"ExpiredInQueue"`, `"Too many concurrent requests…"`

I'm receiving the following error in one of my actions, and am having trouble understanding why.
{"code":"ExpiredInQueue","message":"Too many concurrent requests, backoff and try again."}
{"code":"ExpiredInQueue","message":"Too many concurrent requests, backoff and try again."}
This is occurring in an action (called extractProductListings) which is doing a few different things, including executing http requests, mutations and queries, but it is limited to performing only a max of 10 of these concurrently. However, extractProductListings is being executed by another action, which is executing extractProductListings many times concurrently. These all comprise a scraping script, which is intended to perform a lot of computation (web requests, queries, and mutations) simultaneously in a short period of time. I've tried to be mindful of conforming to the limits described here (https://docs.convex.dev/functions/actions#limits), and believe I am conforming to them, so I'm wondering if there could be some constraints not described in the docs that I'm running into. Is it the case, for example, that there are global concurrency limits (per Convex account) which I'm encountering? Or perhaps there's something else I might not be considering?
16 Replies
ian
ian12mo ago
Sorry you’re running into this- I’m curious what the parallelism is when this happens? We’re working on a scaling feature which will help with this as we speak. Right now one way to make actions scale faster is to make them node actions with “use node” I believe there are still global per-backend concurrency limits for the convex environment. This will get fanned out via a project we internally call “fun run”
RJ
RJOP12mo ago
I was observing this with no more than 100 actions running in parallel at once. Is that too many? And I was already using Node for reasons of library support 🤔 By reducing that number to 50, I no longer observe this error Actually, I'm not sure that everything is peachy if I reduce the number of simultaneous actions to 50 max (I'm now observing occasional Uncaught errors with no additional details)
Michal Srb
Michal Srb12mo ago
Do you need to have so many concurrent actions? Can you break them down via the scheduler?
RJ
RJOP12mo ago
I do, or I need at least some large number. I moved a script which I was running locally to Convex in hopes of better performance (I specifically needed more parallelism). I'm trying to execute lots of web requests simultaneously--locally, this script would take around 3 hours per execution, and I need to run it many times.
ian
ian12mo ago
If the contention isn't on the actions but just the query/mutations, it makes me wonder if you could batch the database reads / writes or if they're conflicting and incurring backoff one action that does a batch read, fans out, collects results, and a batch write
RJ
RJOP12mo ago
Yeah, could that be the case? I can definitely do some read/write batching if that would make a difference. But it would still be helpful for me to have some understanding of what exactly the concurrency limits I'm running into are
ian
ian12mo ago
Happy to jump on a call to talk through your usecase & limitations. There’s currently a fixed pool (will become dynamic with fun run) of isolates that service query/mutations. More than that gets queued. But if the queue isn’t emptied fast, we start aggressively dropping to keep up.
RJ
RJOP12mo ago
Do actions participate in this pool as well? Or is that just queries/mutations?
ian
ian12mo ago
separate (larger) pool for v8 actions
RJ
RJOP12mo ago
After talking to Ian, I made the following changes: - Refactored to use a single Node action - Batched the one mutation that comprised the vast majority of the write load - Used a semaphore (specifically the Effect implementation: https://www.effect.website/docs/concurrency/semaphore) to limit the number of concurrent Convex queries/mutations (I used a limit of 28 actually, I reduced it to 20) And it's worked great! No more errors, and performance is good enough for my needs right now. Thanks again @ian for all your help and ideas! I've done some reworking of this script in order to try to accommodate a heavier load (I eventually began running out of time when executing it in a single Node action), and have begun encountering erroneous validation errors. By "erroneous" I mean I'm nearly certain that they do not reflect the actual issue I'm running into. Here's the one I keep seeing:
ArgumentValidationError: Object is missing the required field `titles`. Consider wrapping the field validator in `v.optional(...)` if this is expected.

Object: {}
Validator: v.object({titles: v.array(v.string())})
ArgumentValidationError: Object is missing the required field `titles`. Consider wrapping the field validator in `v.optional(...)` if this is expected.

Object: {}
Validator: v.object({titles: v.array(v.string())})
I only invoke this query once, however, and to make absolutely certain that it was never receiving an empty object as it's argument, I called it like so:
ctx.runQuery(
internal.scripts.googleShoppingLeadGen.queries
.getProductListingsByTitle,
{ titles: titles ? titles : [] },
),
ctx.runQuery(
internal.scripts.googleShoppingLeadGen.queries
.getProductListingsByTitle,
{ titles: titles ? titles : [] },
),
But it still appeared. So I'm suspicious that something else is going on which is causing this, but obviously I don't know what it might be! This correlates with an upstream error (in the action) of
Uncaught Error: {"code":"InternalServerError","message":"Your request couldn't be completed. Try again later."}
Uncaught Error: {"code":"InternalServerError","message":"Your request couldn't be completed. Try again later."}
And query execution times of ~0.5s. My guess is that this still has something to do with too much DB load, or something? Although in these cases I should still be limiting the number of simultaneous queries/mutations (to 20, and then 12, but I saw these errors in both cases).
Michal Srb
Michal Srb12mo ago
The validator error looks very surprising, are you sure you’re actually running the latest version of your code? (There’s isn’t a schema validatior error blocking your npx convex dev for example? You can verify via logging)
RJ
RJOP12mo ago
Yeah, it's running the latest version of the code (just triple-verified). And in any case, I haven't touched this part of the code in a while. I'm also seeing these errors show up in the logs many tens of minutes after the main script has stopped running (also not sure how unexpected that is)
ballingt
ballingt12mo ago
This is asking you to debug something that isn't your fault, but could you try removing the validator and replacing it with a manual check on the first line of the query? Then you could log it first to confirm that it's really an empty object. Based on your code of course it is, but if it's a Convex issue would be nice to confirm it here.
ian
ian12mo ago
I recently had a similar issue where the cause was a rogue script that I had accidentally backgrounded and not fully killed. Not sure if that's even possible given what you're up to
RJ
RJOP12mo ago
I've been seeing this occasionally for a little while, but each time in the past it appeared after the Node action invoking the queries/mutations timed out, so I figured it had something to do with how the action was getting shut down in that scenario and didn't worry too much about it, but this time there is no apparent cause. Unless maybe I was trying to do too many things concurrently? How should I interpret this error? I'll try this shortly When I remove the validator and console.log title, I see no logged values and no validator error message! This is still following the upstream error:
Uncaught Error: {"code":"InternalServerError","message":"Your request couldn't be completed. Try again later."}
Uncaught Error: {"code":"InternalServerError","message":"Your request couldn't be completed. Try again later."}
ballingt
ballingt12mo ago
Great, thanks! One more thing, is this on prod or dev? and another actually, could you dm me this backend name (moved to DMs)