I have a large JSON file with objects, what is the best way to get that into an internal action?
Specifically I have a JSON file where I maintain a list of Google's supported canonical locations. The file has more objects in the array than can be committed in one mutation.
I want some way to get this large JSON array into an internalAction I can call to have it insert or patch documents in my locations table.
I've attempted multiple things so far:
- Tried using the args of a function by the CLI; the argument length was too large at least when I ran through npx
- Tried reading the file from a "use node" action; the source file isn't present on the bundled version
- Tried statically importing the JSON into a "use node" action; the bundle size became too large to upload
- Tried uploading the JSON to Convex and passing the storageId to the action; Upload went fine but
await file.text(); on the blob from const file = await ctx.storage.get(storageId); fails with an "Array buffer allocation failed" error and the documentation says nothing about supporting reading the file from an action.13 Replies
Thanks for posting in <#1088161997662724167>.
Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets.
- Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.)
- Use search.convex.dev to search Docs, Stack, and Discord all at once.
- Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI.
- Avoid tagging staff unless specifically instructed.
Thank you!
I'm curious: why use a JSON file for this data? My first impression would be to store it in a table as a collection of documents, with one or more indices as appropriate to make searching efficient.
I ultimately do want this in a table, but this is an external dataset so it has to come from somewhere. I'm piecing it together from a couple APIs/downloads and the JSON file just makes it easy to know when something has actually changed.
Gotcha. How are you currently updating the JSON file with the data that you collect from those various sources?
I have a simple Node.js script that fetches from 2 APIs, combines and reorganizes the data into the format I want in the database then saves that to a file I commit.
The result is a 75Mb JSON file with 229,510 objects in it. Too much to include in the bundle.
I have found a way around the 4th limitation, using
await ctx.storage.getUrl(storageId) and fetching that from a "use node" action. But I have to limit what I return from that action and call it multiple times with each time taking 15s+ to return a small fraction of the array.This script could likely be converted into a Convex action function that saves the data to a table. You could run it manually, or set it up with a Convex cron job to run on a schedule.
Right now I'm trying something similar, getting the data from storage instead of re-calling the same external API, but getting limited by mutations. Even upserting in batches of 1000 takes longer than 600s.
To work around that, use scheduled functions. Schedule a series of mutations to run sequentially, each one operating on a portion of the data.
Alright, I forgot about recursive scheduled functions and was considering a workflow component instead.
Before I go and do that, is there some way to halt a function or at least stop it from scheduling another task? Just so I don't go and accidentally create an indefinite loop of scheduled tasks.
Workflow would probably work just as well
Ideally that node.js script would've been able to do the inserts, but I don't see anything that would allow me to write a Node.js script that invokes an internalMutation the same way that
convex run does.is there some way to halt a function or at least stop it from scheduling another task?I usually end up just building that into the logic somehow. If you know the size of the data (an array, for example), each scheduled function would slice off a portion of it, and only schedule the next function if there's more left to slice. Your Node script can call an HTTP endpoint. That can kick off a function
Sure, I'm asking because I accidentally missed one line of code on an earlier version and ended up with a variant that forgot to increment the offset and continued looping starting from 0 each time until the function finally timed out.
Ok, I really wanted to skip the need for the Node.js script to authenticate this somehow, since we already have internal functions that can be called from the cli/dashboard. But I'll keep HTTP endpoints in mind too.
Thanks for the help, I'm already using the workflow component elsewhere so I'll try repackaging the current Node.js action I have added pagination to into a workflow that continues untill there's nothing left.