luke
luke•2mo ago

i got a question: how would I sync an

i got a question: how would I sync an external list to the convex DB? I have a Substack newsletter that I'd like to keep track of their subscription status (mainly free vs paid) in my Convex app. I have setup an automation where I'm able to download, every hour, an updated CSV with all my subscribers and their status. Now I'm thinking how I will keep that CSV in a Convex table or be able to reference its data. It's ok that the data is not realtime, as the CSV is only updated every hour, thought not ideal. Approaches I thought of: - Create an cronjob action that will download the subscriptions CSV, and iterate over it every time to update a table called subscriptions. What I don't like: seems to be I'll be updating 1k+ records every hour, most of the time without any changes - Create a cronjob action that will download the subscriptions CSV and upload it to Convex using file storage Then, I need to always download the file whenever I need to check subscription status - Move off substack and use something more dev-friendly, with some sort of webhook to keep track of subscriptions i'd love any thoughts on this!
11 Replies
erquhart
erquhart•2mo ago
Are there any timestamp or unique id fields in the subscriber data? Or can you share the list of fields in the csv
luke
lukeOP•2mo ago
the email is unique otherwise no I can share the csv gimme a sec
erquhart
erquhart•2mo ago
hmm I wonder if you can diff the last and current dataset in js
luke
lukeOP•2mo ago
email,active_subscription,expiry,email_disabled,created_at lucasheriques@gmail.com,true,,false,2024-11-18T20:54:35.296Z i could also potentially store the whole csv as a text string and parse it in the logic. though that means every time I query for subscriber info I'm querying for >80kb of data (~1.3k subs). which might add up quickly because every time a user checks its profile on my separate platform, it would have to do that query unless the cache would prevent that from increasing db bandwitdh
erquhart
erquhart•2mo ago
updating the list and checking for an email should be different operations a user checking their profile would just query a single record by email right
luke
lukeOP•2mo ago
oh yes true, that's correct
erquhart
erquhart•2mo ago
So this should only happen once per hour
luke
lukeOP•2mo ago
it still means that, every hour, I'm querying for 1k+ users and potentially mutating 1k+ users too maybe it's not worth worrying about that though
erquhart
erquhart•2mo ago
yeah I would do the simple inefficient approach first and see if it's good enough tweak from there I would: - parse the csv - async map (for parallelization) over emails and query for each - write the ones that don't exist At 1.3k subs this will work okay in a single query, at scale you'll want to break the data up and do multiple ctx.runMutation calls Your first run will write all 1.3k records, subsequent should only be writing for new records and be pretty quick So the function you're caling in the cron job would be an internalAction so it can make api calls and run mutations inline.
luke
lukeOP•2mo ago
yeah great, that's what I was thinking, but helpful to see my approach was not CRAZY 😄 thank you so much! so the only reason I'll be writing to the DB is: - new subscriber - subscription status change
erquhart
erquhart•2mo ago
yep exactly

Did you find this page helpful?