Web Crawling Background Actions
for example i launch 100 actions, 1 return in 1 minute, another in 10 mins....and so on
27 Replies
You have a situation here where you have a "driver" (schedule more work to be done) and "workers".
basically, initially call the driver. the driver schedules the workers
the long-running actions you're referring to
in the final mtuation when one of the workers succeeds, transactionally, when it updates the one job as done, it should also schedule the driver function to re-run
then the driver function can decide if the whole batch is done or not, or to schedule more work if new work is discovered, etc
and if you don't want this whole state machine to grind to a halt the first time you get an http error, use convex action retrier https://github.com/JamesCowling/convex-action-retrier to ensure all your actions actually complete so the batch keeps moving until it's really done
GitHub
GitHub - JamesCowling/convex-action-retrier: Helper function to ret...
Helper function to retry a Convex action until it succeeds. - JamesCowling/convex-action-retrier
if it can simplify things, the driver know a-priori the number of pages a website has, so it know for example that will launch exaclt n workers
right, this is a nice simplification, but the architecture is nearly the same either way
which is good b/c if you end up with something more dynamic, it's not a big leap
just a small change to the driver
to schedule more discovered work. but if there will be no more discovered work, you save yourself like 3-4 lines of code
ok I will look at what you sent to me and come back here, do you have any other theory I should learn to do this? 🙂
Im not experienced in building this stuff
no, but happy to review code. it's kind of fun to learn about these workflow-y things, and they actually can all be modeled in convex pretty well using the primitives
I think this is enough theory to give it a shot!
the key things to know are 1. convex-action-retrier 2. ensure idempotency of your actions and 3. the fact that in mutations updating state is atomically related to scheduling future work
those are the three key pieces to build this
wow, give me some time to put my head around this 😄
for now it is arabic
🙂
based on what you've said to me, it feels like you're on the right track so far
well, let me tell you this funy story
i bouth professional pla few days ago
i still have 0 customers
i managed to pass beyond limit of bandwith usage just doing testing
so clearly I still need to learn ehhe
i had a couple of cron running every 10 seconds and doing probably full table scan without index
they burned all my credits in 2 days
sorry to hear that. yeah, we're working on better telemetry to prevent these kinds of things. one thing to know about is you can always go to your project settings and "pause deployment" to prevent any more computation from happening
oh thats useful, at 1 point i had 1000+ sheduled functions, and i didnt know how to stop that
we'll keep working on it, but it's actually tricky to create something as powerful as convex without also having it be a little too easy to "run away consume" by expressing cycles or expensive data passes or something else. that's not the way we want to win long term customers! so we'll keep figuring out ways to prevent that
data:image/s3,"s3://crabby-images/14bf8/14bf89bacf91a998a8758c7cff2415c6a605b6bb" alt="No description"
See "Pause Deployment" at the bottom there. feel free to mash that anytime you want everything to stop so you can get your bearings
to be honest i love convex DX, and i decided to bet on it to build an enterprise SAAS
I hope I can manage the workload with convex in production
that's great to hear! we're working a ton on production telemetry right now so that developers have a lot of confidence about how everything is working in production, how to tune their site, and how to lower their costs, etc
it's a focus area for us currently
amazing, if I will need some help on how to go on production
can i talk to you?
yeah, we'll often do a conversation with folks and give them some architecture advice and maybe optimization/cost reduction advice before they go into production. just dm me and we'll arrange it!
Thas amazing, I read much documentation on convex but I didnt find anything related to production/enterprise grade stuff,
yeah, tbh we're a little behind on documentation and tooling. in the last 4-6 months we've suddenly had a new batch of high load, real production sites. and we're catching up on all the 1:1 advice we've given them, things we learned, etc. making those practices/learnings/insight scalable via good docs and tools. we're not there yet
but people are using convex for really crucial stuff now, which is very exciting for us
thats encorauging
because my sass if it will successfull will be kind of high load
and i was not sure if convex can handle that
you are the CEO?
the convex team has a lot of experience scaling high load sites. we desinged, built, and operated systems with millions of requests per second (at Dropbox) and exabytes of data. so scaling is always a challenge, but it's in our wheelhouse and we tend to react pretty quickly to a new customer with a novel, challenging kind of load
yep
thats nice, i feel important now 🙂
LOL
assistance from CEO
well, I just had a free minute while transferring between one private jet and another, so...
seriously though, we're a pretty small team and everyone helps with everything. anyway, thanks so much for trying convex, and reach back out if we can help with this workflow problem any more
ahhaha, Im also CEO of my company, but I move with train still
I will definitely come back to you, happy to have this breif presentation with an important tech partner in the future 🙂
have a nice weekend!
Hey @Jamie do you have any example where I can take inspiration? Thanks a lot