i got a question: how would I sync an
i got a question: how would I sync an external list to the convex DB?
I have a Substack newsletter that I'd like to keep track of their subscription status (mainly free vs paid) in my Convex app.
I have setup an automation where I'm able to download, every hour, an updated CSV with all my subscribers and their status. Now I'm thinking how I will keep that CSV in a Convex table or be able to reference its data.
It's ok that the data is not realtime, as the CSV is only updated every hour, thought not ideal.
Approaches I thought of:
- Create an cronjob action that will download the subscriptions CSV, and iterate over it every time to update a table called
subscriptions
. What I don't like: seems to be I'll be updating 1k+ records every hour, most of the time without any changes
- Create a cronjob action that will download the subscriptions CSV and upload it to Convex using file storage Then, I need to always download the file whenever I need to check subscription status
- Move off substack and use something more dev-friendly, with some sort of webhook to keep track of subscriptions
i'd love any thoughts on this!11 Replies
Are there any timestamp or unique id fields in the subscriber data? Or can you share the list of fields in the csv
the email is unique
otherwise no
I can share the csv gimme a sec
hmm I wonder if you can diff the last and current dataset in js
email,active_subscription,expiry,email_disabled,created_at
lucasheriques@gmail.com,true,,false,2024-11-18T20:54:35.296Z
i could also potentially store the whole csv as a text string and parse it in the logic.
though that means every time I query for subscriber info I'm querying for >80kb of data (~1.3k subs).
which might add up quickly because every time a user checks its profile on my separate platform, it would have to do that query
unless the cache would prevent that from increasing db bandwitdh
updating the list and checking for an email should be different operations
a user checking their profile would just query a single record by email right
oh yes true, that's correct
So this should only happen once per hour
it still means that, every hour, I'm querying for 1k+ users and potentially mutating 1k+ users too
maybe it's not worth worrying about that though
yeah I would do the simple inefficient approach first and see if it's good enough
tweak from there
I would:
- parse the csv
- async map (for parallelization) over emails and query for each
- write the ones that don't exist
At 1.3k subs this will work okay in a single query, at scale you'll want to break the data up and do multiple
ctx.runMutation
calls
Your first run will write all 1.3k records, subsequent should only be writing for new records and be pretty quick
So the function you're caling in the cron job would be an internalAction
so it can make api calls and run mutations inline.yeah great, that's what I was thinking, but helpful to see my approach was not CRAZY 😄
thank you so much!
so the only reason I'll be writing to the DB is:
- new subscriber
- subscription status change
yep exactly