Large Tables
Hi I'm trying to import a CSV that has 3M rows (crunchbase basic csv) for a lookup usecase I was looking forward to. Basically an Algolia/Redis type experience with the convenience of convex.
I chunked it into 100k rows each file but it still fails with a 408 :
npx convex import --table cb_basic_org 01.csv --replace
✖ Importing data from "../DATA/crunchbase_basic/01.csv" to table "cb_organizations" failed
408 Request Timeout
1. Is it a bad idea to have a 2-3Gb csv dataset as a single table for a algolia style experience?
2. What is the optimal batch size to avoid these fails?
3. I'm doing this in DEV but seems like an overkill to keep loading this data for every preview, I'd rather have a PROD table with this which I refer to in DEV but I realize that a convex app only can point to a single endpoint. Should I load the data to PROD and then "replicate" the environment to DEV to avoid having to load the data twice? how should I think about this?
4. Do I understand correctly that each row is considered at least 1k for the purpose of quotas (so I should try to keep long tables around but under 1k per row for maximum cost efficiency?)
Thanks
P.8 Replies
hi! i'm actively working on making
npx convex import
support larger data sets. 2-3gb and even 100mb are larger than we support right now; sorry about that. also note npx convex import
doesn't yet work with preview deployments (i'm also working on that).
you're correct that each row counts as at least 1kb for the purpose of quotas. this counts both as "Database storage" and "Database bandwidth" during the import and when you read the data in queries.
if you want your dev instance to read from prod, i suppose you could use an action to forward the request to prod's url. but if the purpose of dev is to be "prod with in-development code", i would guess that populating a smaller amount of data in dev would be a more useful than reading it from prod.
(anyone else feel free to jump in about devx/billing; i mostly wanted to give context on current state of import)Thanks, seems like 50k lines a pop is working now. I'll keep loading to test my thesis, would be great if someone chimes in on wether this is a good idea.
Is it possible to connect from one convex environment to another endpoint? I looked for it but couldnt find anything about it... I'm working on the assumption that one node app can only talk with one convex instance due to ENV / generated code, did I get this incorrectly/ can you hint on where to find info about that?
To connect to another deployment you can make an HTTP request from a Convex action or use a separate Convex client
gotcha! forgot about http actions....
Or using the ConvexHttpClient, which can queries, mutations, or actions
BTW loaded about 300k lines now, but then added 2 indexes and the convex dev is taking forever (5+ minutes)
still running...
@Pietro hopefully that didn't take 2.5 hours! 😛
But yea, Convex currently indexes at code push time so that can take a bit if you have a large table.
Adding an index first and then adding rows will go faster since the indexing cost is amortized over each insert.
npx convex import should now be able to support 2-3gb files. Best of luck https://news.convex.dev/announcing-convex-1-9/
Convex News
Announcing Convex 1.9
Big Import and Export Improvements
When we released ZIP file imports and exports in Convex 1.7, it wasn't long before industrious developers started hitting the limitations. We’ve got some big improvements for you.
* File storage can now be included in Snapshot Export
* Snapshot import via npx convex import