r2/actionRetrier cleanupExpiredRuns cron job stuck at loop
I'm noticing this message repeatedly appearing in the logs, and it seems to be originating from r2/actionRetrier. @erquhart

40 Replies
Hmm that's kind of what it's supposed to do, the errors are like the system was down, so it's retrying. How often are you seeing this?
I first noticed it looping today, even though I'm not currently trying to upload anything.
Every 24 hours it runs a cron to clean up expired runs, but it should only loop like this if the action fails, which should only happen if the action fails for some reason.
So this is normal to see once like this.
So, there’s no retry limit, and does it stop after 24 hours?
Cause it's still trying
Hmm that's a system error that should have just been intermittent. I also don't see that the action retrier is actually written to retry this action at all, so I'm honestly not sure what's happening here.
cc/ @Ian in case you have any insights
How often is the cron running? and is there data in there?
It looks like there's some retry-able failure. The scheduler will continue retrying mutations if they OCC, e.g.
Maybe it's incorrectly classifying it, or they're all competing for writes..
Totally I don't see either, thanks for the help
normally it should run every 24 hour but right now it's retries every second
with some backoff
The r2 component sets up one instance of the action retrier so I assume this is just one function rerunning and not multiple
I can't find this function or usage of the action retrier in r2, so I'm looking in the wrong spot. @erquhart i'll leave it to you for now
I found this—one of the runs was skipped. Maybe it will help.

Can you check how many documents are in the
runs
table of the retrierThere are 49 documents
but none of them has numFailures
Yeah that should be a very fast mutation. This isn't actually retrier stuff this is just a simple mutation that deletes all completed records. It's a mutation, and it's not actual retried by the retrier: https://github.com/get-convex/action-retrier/blob/c4c74363b80ff8503e74fd30819e724956007964/src/component/run.ts#L312-L330
Will see if someone can take a look.
The only reason it should be stuck is if the number of documents exceeds 1024, but I only have 49. I tried deleting the records manually, but that didn’t help.
and it's still running
If you leave it be, it might help whoever looks into it if it remains in a failure state - assuming this isn't having any known impact on your project
This might be the reason: no arguments are being passed to the function.
https://github.com/get-convex/action-retrier/commit/22c5d01feaa2681f76c0745b11fdb4d74d0d9032
It's fine, I'll leave it as is.
I'd expect that to cause failure 100% of the time, but you have successful runs in the logs
You’re right—if no arguments are being passed to the function, it should cause a failure every time. It’s curious that there are still successful runs in the logs. Still investigating
actually now that I see this...I am also getting it and my fail rate is at 100% any ideas? 🤔


I'm still trying to figure it out—facing the same problem.
https://discord.com/channels/1019350475847499849/1357528982425436271
Yeah i'm also getting this now, i just switched from local dev to cloud as i blew up something locally. It took me searching here to find that it was action retrier as i was checking my logs and not seeing anything (only viewing app).
normally i run npx convex dev --tail-logs disable so i wouldn't have seen it
can you all dm me your deployment names so we can dig into logs on our side
sorry about that, no idea what's up here
and obviously you're not getting a very clear error message
if you have a pro account, if you make a ticket from your dashboard, that will do it automatically. if not, dming me is fine
reached out thanks for the help
adding @Zeroday too.
we've identified the issue. working on a fix here
can I learn what's the issue love the technical side of things.
we'd made a change this week to convex system tables that affected cron jobs inside components
so components which use cron jobs (like action retrier) were affected
@nipunn is working on rolling a fix out now
https://github.com/get-convex/convex-backend/commit/486f13405114c85525ff1935fac24262d2d9410e if you are curious. It's very behind-the-scenes. Rolling it out now.
I'll look into this, thanks for the quick fix and explanations
just rolled it out - are you seeing improvement on your side?
Yeah its no longer showing the error as of 10 minutes ago
excellent :phew:
while on the topic of seeing log entries 🙂 i noticed that cleanupExpiredStreams runs every minute which is part of the new persistent-text-streaming component which i installed(and haven't used yet) and just did a quick search and didn't see anything on the docs like if that needs to loop that aggressively, or why it runs often .
https://github.com/get-convex/persistent-text-streaming/blob/main/src/component/crons.ts does look like it! @Jamie wrote the component and might have some wisdom. Probably doesn't need to run every minute, but also 🤷 doesn't hurt. It's not going to be unreasonably expensive. Seems fine.
What about the aggressive running is bothersome? The log entries?
There is a dropdown to filter log entries which could be useful.
yeah there is no issue in my end too
we could make it configurable if you want to ramp it down to every N minutes or whatever
I did the math and it was like 4 cents of function calls a month, so it didn't seem like a big deal
yeah, $0.043 per month for 1440 * 30 function calls
I wonder if a good pattern would be having the cron run a query that only schedules a mutation when necessary, as the query would probably be cached most of the time when there's no rows to cleanup. Which will help when caching is treated differently in pricing/usage limits.
No issues on my end anymore either. I had a question though. I was getting that error repeatedly for about a day or two. Will this add to my bill?
@rebecca maybe you can help -- would that count as a billed function call?
if a function failed due to an internal system error i don't believe that would count, but please don't hesitate to reach out if your bill doesn't look right!