Eternal Mori•2mo ago

Convex processing large datasets and use cases

The more I work with it, the more I feel like convex is not meant to process large datasets. My opinion is based on: - I migrated my most demanding real scenario to convex - There are no examples for advanced use cases - There no examples for data processing When I ask in ⁠#general I get told that there a large code bases that use convex, but when I ask for examples or open source solutions nobody responds. When I ask clear questions in the ⁠support-community I get no answers (probably because not many people use advanced use cases) or dont know how to execute it in Convex. --- I just ported over a simple piece of the process that just loop trough all the data in batches and at this point in time it is still going for over 25 minutes, while the whole process in a MySQL query takes me 3 minutes. I am also open for feedback and examples, but so far, nobody responded. That was my vent, amen!

9 Replies

Wayne•2mo ago

Thanks so much for this honest feedback — it helps. You're right that we need more public examples showcasing Convex in advanced and large-scale use cases. We'll explore ways to highlight Convex better for large data workloads and complex applications. They exist. That said, we also have customers using Convex at scale, serving millions of users and handling large, complex data. Those implementations aren't public (yet), but we hear you: the need for more examples, answers, and transparency is clear. We're a growing community and appreciate our community to answer support community questions when they have time. We appreciate the nudge on documents and advanced use cases. Thank you for completing the feedback form. Keep the feedback coming—we're listening.

Eternal MoriOP•2mo ago

I see a lot of different answers, U are from the convex team, what is your reaction to these answers?

Wayne•2mo ago

Hi 10 million rows is a lot. What are you cooking? I see you're using Claude, which may not have all the latest recommendations for Convex. I highly recommend using "Ask AI" in Convex docs for questions on Convex. https://docs.convex.dev/home To answer your question on " 10+ million row dataset from MySQL to Convex". Note, Convex isn’t built like MySQL—it’s optimized for real-time OLTP workloads, not big analytical queries that scan millions of rows. That’s why large full-table scans can be slow unless you’re using proper indexes. For big datasets, we recommend: - Using withIndex() for efficient queries
- Denormalizing data for things like counts
- Paginating through large results
- Offloading heavy analytics to an OLAP database (Convex integrates with Fivetran for this) The team’s also working on embedded analytics with DuckDB and a SQL-style inline query system. Convex can handle complex apps and large data, but it needs a different approach than SQL. Also Ref: https://stack.convex.dev/translate-sql-into-convex-queries and https://stack.convex.dev/merging-streams-of-convex-data stack post

Eternal MoriOP•2mo ago

These where reactions in #general not my text https://discord.com/channels/1019350475847499849/1019350478817079338/1360607247801253898

Wayne•2mo ago

Sorry, I was looking at the Claude questions.

jamwt•2mo ago

yeah, sorry, we've been overwhelmed here but convex is opinionated. taht means it does OLTP stuff in ways really really scaled for OLTP and OLAP stuff in specialized ways for OLAP, as others have said meaning: it does not like batch work that much in general, but OLTP databases are happier with lots of little interspersed mutations that do not hold locks on very many rows for doing batch passes on stuff, the situation today is possible, but it's not great what teams do is just use the Fivetran connector to like bigquery or clickhosue

jamwt•2mo ago

https://docs.convex.dev/production/integrations/streaming-import-export#streaming-export

Streaming Data in and out of Convex | Convex Developer Hub

Streaming Data in and out of Convex

jamwt•2mo ago

running large queries over your OLTP database is actually an antipattern, so it's better to do these in OLAP engines and then just merge the aggregation back in with a mutation the more ergonomic way we want to solve this, after we get through some more chef stuff, perhaps later this summer is by just embedding an OLAP engine probably duckdb so in an action, you'd be able to execute a really high performance aggregate (even faster than MySQL!) across all your table data (a slightly delayed replica, maybe a few seconds behind) and then write back out any expensive calcuations with a mutation into whatever aggregate collection you want this is how bigger companies actually manage OLTP vs. OLAP workloads -- convex would just give you nice ways to do it out of the box I understand the continued suspicion that maybe no one is actually doing anything big on convex 🙂 unfortunately, those code bases are not open source and very seldom are I totally acknowledge that convex is confusing in this regard right now, I apologize. the teams that know how to do this well right now have learned through either prior experience with production systems, and/or chatting with the convex team there are two big thing we owe everyone as soon as we can 1. a book called "Real World Convex", that lays out project organization, solving common problems in practice, etc etc. and basically documents what all bigger teams have discovered with the convex team are the patterns for success when scale gets reals 2. An in-product OLAP capability. b/c it's a PITA wiring up a data warehouse for just running simple aggregates every now and then. all these things are possible right now, but not very "discoverable" @puchesjr actually did a great job answering some of these questions for us, thanks!

whatisagi•2mo ago

wow a “Real World Convex” would be fantastic !

Convex processing large datasets and use cases

Did you find this page helpful?