punn•3y ago

Function execution timed out (120s)

One of our larger actions that imports a ton of data from an external source takes quite a while to execute and we're hitting the timeout. Is there any suggested patterns we should employ? the flow is as follows: 1. fetch all listingId, fetch and store all reservationIds 2. for each reservationId, fetch and store all conversationIds 2. for each conversationId, fetch and store all messages

26 Replies

punnOP•3y ago

Should I also be scheduling this action instead of calling it from the client outright? If I schedule the action, the client is free to do anything else correct?

ballingt•3y ago

I'd suggest chunking up this work and recording your progress in the DB. Each chunk should ideally run for less than a minute, then when it gets to a stopping post record your progress in the database and schedule that function to run again. This way even if there's a failure, you can resume since you've been recording progress in the DB. You can write a query that reports progress too for a client-side progress bar.

ballingt•3y ago

If I schedule the action, the client is free to do anything else correct?

I don't quite follow this, could you say more? Whether you schedule the action or call it directly from the client, the client is free to do whatever it wants. Mutations run one by one in Convex but actions don't block anything, see https://blog.convex.dev/announcing-convex-0-14-0/#breaking-actions-are-now-parallelized for more

Convex News

Announcing Convex 0.14.0

Meet the brand new Convex Rust client! We’ve also open sourced the JS client, added schema validation, and more. 0.14.0 is the best version of Convex yet!

ian•3y ago

You might consider something like calling a function (mutation or action) that schedules an action for every reservationId. that action does steps 2 & 3 for the reservationId, then marks that reservationId as done in a table. This Stack post has some tips on patterns for tracking job state: https://stack.convex.dev/background-job-management

Background Job Management

Implement asynchronous job patterns using a table to track progress. Fire-and-forget, cancelation, timeouts, and more.

punnOP•3y ago

Ahh that makes sense. Thanks a lot for the explanations + examples I'll try a few different approaches. A bit new to writing production code with lots of data so haven't come across these issues very often. I'm getting a lot of TokenExpired errors after calling the action from the client although the same actions work fine on the dashboard. Seems like for actions that take a while to execute (20+ secs), the token for the websocket connection is lost?

ballingt•3y ago

Ah gotcha, we'll look into that. As a workaround for now and as a generally more resilient approach (since it doesn't require connectivity through the duration of the action), scheduling these actions to run immediately is a nice way to do things.

presley•3y ago

We will look into fixing the auth to only be checked at the beginning of the action. Another workaround in the meantime is to make the Clerk expiry longer than 1 minute. We recommend 1h as our default setting.

punnOP•3y ago

{"code":"Overloaded","message":"InternalServerError: Your request couldn't be completed. Try again later."} Would this be due to scheduling with too little delay between each action?

presley•3y ago

Yes, this is likely. I think this has resulted in many concurrent actions that all queued up queries/mutations at the same time? What is your instance name?

punnOP•3y ago

https://knowing-emu-505.convex.cloud we mightve overloaded it since a lot of our scheduled actions are hanging this is p0 for us rn since its blocking prod activity. is there a way to clear all scheduled actions?

presley•3y ago

You can do that from the dashboard Go to the functions tab and find the action you have scheduled => Scheduled Runs => "cancel all"

punnOP•3y ago

Hmm there doesn't seem to be any scheduled fns

presley•3y ago

Hmm... I also don't see any issues with knowing-emu. What is hanging exactly?

punnOP•3y ago

The scheduler doesn't seem to run the action after the delay. Line 124 in actions/pmsData:fetchAndStoreReservations thought it was auto-blocked due to congestion it runs fine on dev instance https://mellow-elephant-424.convex.cloud

presley•3y ago

Hmm... I don't see anything backlog or any errors for knowing-emu-505. You are positive scheduling is happening? If it is happening, it must show in the dashboard as pending (if scheduled in the future) or as executed in the logs (if already executed)

punnOP•3y ago

the same logic is deployed on dev and prod but the prod scheduler doesn't fire the action after 1000ms

presley•3y ago

Yes, I don't see any scheduled executions, but also don't see any errors. How are you scheduling the functions? Is it from a cron or mutation trigger?

punnOP•3y ago

let delay = 1000;
for (const listingId of listingIds) {
            const jobId: Id<"scheduledJobs"> = await runMutation(
                "scheduledJobs:addJob",
                {
                    userId: user?._id as Id<"users">,
                    pmsPlatform,
                    type: "listing_import",
                    status: "pending",
                    startedAt: Date.now(),
                    details: listingId,
                }
            );

            scheduler.runAfter(delay, "actions/pmsData:fetchFlowForListing", {
                listingId,
                pmsPlatform,
                jobId,
                userId: user?._id as Id<"users">,
            });

            jobIds.push(jobId);
        }

let delay = 1000;
for (const listingId of listingIds) {
            const jobId: Id<"scheduledJobs"> = await runMutation(
                "scheduledJobs:addJob",
                {
                    userId: user?._id as Id<"users">,
                    pmsPlatform,
                    type: "listing_import",
                    status: "pending",
                    startedAt: Date.now(),
                    details: listingId,
                }
            );

            scheduler.runAfter(delay, "actions/pmsData:fetchFlowForListing", {
                listingId,
                pmsPlatform,
                jobId,
                userId: user?._id as Id<"users">,
            });

            jobIds.push(jobId);
        }

presley•3y ago

You have to await runAfter It is a async operations.

punnOP•3y ago

Ah okay let me try again. Had been working okay with development but that might be it

presley•3y ago

Yeah, it is a race. Since it runs in Node.js we can't gurantee we wait for all futures to complete.

punnOP•3y ago

sweet thank you totally missed that. appreciate the help! working now

presley•3y ago

No worries. I see why having to await the scheduling might be confusing.This is a simple case of dangling promises we can likely throw an error for so it fails more loudly. We wouldn't be able to do that if you had other nested promises but we can detect the most basic/common case.

punnOP•3y ago

Gotcha that makes sense. Only other issue we're having is the overloaded error

presley•3y ago

Where do you see those errors? Is it in the action logs or when you call it from the browser? Is that the dev or prod instance? Yeah, I saw the error on our side. The transactions are failing since they are execute concurrently && they conflict with each other. Is it possible the transactions conflict with each other (they read/modify the same rows)? Do you see OptimisticConcurrencyControlFailure in the Dashboard logs? The error message should have more details. A stop gap solution might be to add some delay between the mutations. A proper fix is to make sure the mutations don’t read the entire table for which a common solution is to use an index.

punnOP•3y ago

action logs and in the ErrorMessage potion of the scheduled job I think they read the same rows but don't modify them And the error message is just this got it I'm using indexes for most queries now so will keep you updated

Function execution timed out (120s)

Did you find this page helpful?