recommendations for storing data used for analytics in Convex?
Hey Convex team, I am building a chat interface that will be storing thousands of records used for analytics, and this data will be queried, aggregated, etc. by month ranges (april, may, june, etc.).
What is the recommended pseudo-schema for storing these kind of data in Convex? I am afraid to create an individual record for each record as I would in postgres
4 Replies
hi @Khalil it's fine to store thousands of records in Convex and ideally you'd just structure your data in the most ergonomic style for you
that said, if you want to run real large batch analytics queries then neither postgres or Convex would be the best fit for these
in general it's not great to mix low-latency customer facing queries (OLTP) with larger slower batch-style analytics queries (OLAP)
the best approach for now would be to use streaming export from Convex into something like databricks/snowflake/bigquery/etc: https://docs.convex.dev/production/integrations/streaming-import-export
Streaming Data in and out of Convex | Convex Developer Hub
Streaming Data in and out of Convex
or depending on your application if you want to query aggregated data in the live path then you could keep those aggregates up to date dynamically
longer-term we'll also add built-in support within Convex to run slower analytics queries on a snapshot of your data
let us know if this answers your question or if you have a more specific use-case
Hey James, thanks for your reply. I am thinking of storing each entry and have a cron job that aggregates by month all the data, meaning there will be 1 entry for each previous month and for the current one the data will be more granular (probably some users will peak at 10_000 records), is this within convex limit for querying?
What I struggle with the most with Convex is understanding it's limits or how it compares with traditional SQL databases, which is what I have been using for years
That sounds like a great overall approach. There are query limits of scanning 16k records in a query. When you get up to the ~10k range you should probably use pagination to walk the data.
Limits | Convex Developer Hub
We’d love for you to have unlimited joy building on Convex but engineering