Adam Harris•9mo ago

How to count a large set of results from a query?

Hi, suppose I have a query such that the full set of results would be over the 16k convex limit. What is the best way/or are there any suggested techniques for counting this result set?

8 Replies

jamwt•9mo ago

@Adam Harris hi! if you need it to be high performance, generally you should denormalize and keep the counts on insertion any kind of orm-type table wrapper could help you do this without so much code duplication. we might provide some packages soon that wrap these patterns up to make it easier

Adam HarrisOP•9mo ago

Hi @jamwt, thanks for your suggestions! it doesn’t need to be high performance and ideally we need to be able to do this kind of thing ad-hoc for reporting, so keeping counts insertion won’t really work for us here unfortunately. We’re really missing sql-style ad-hoc aggregations at the moment. Can you please expand a bit on your orm-type table wrapper that you mentioned?

jamwt•9mo ago

so, a couple of options then. no database will truly do high performance counts of anything without denormalization. however, you can often use something like an OLAP engine to pretty quickly do ad-hoc queries of counts and things like that. option 1: manually denormalize counts into running aggregates upon mutation. very very fast, but if you want to do it for aribrary queries, agree, this option doesn't work. option 2: rely on some sort of olap-style columnar query system that will do a pretty good job running artibrary aggregates at high speed for option 1, I'll try to find some threads of people solving this before. for option 2, we plan on shipping some kind of SQL-based inline OLAP system later this year for arbitrary read-only queries (slightly delayed) at high speed. we don't have a great option yet, unless you want to use the fivetran connector and sync some of your tables into e.g. bigtable. then you can hammer away at aggregates to your heart's content. but it's more work to set up! and the replicate delay can be several minutes etc. depending on how much money you want to spend sounds like you all want option 2, which you're not alone. it's a really common request from our customers right now, and I'm bummed we don't have it solved yet. it's on the roadmap to ship something soon to address this without all the integration headaches, but for now, you really do have to stream to some other system that runs aribtrary aggregates in a better way than convex. that's what many other customers are doing

jamwt•9mo ago

more info on the streaming support people are using currently: https://docs.convex.dev/production/integrations/streaming-import-export#streaming-export

Streaming Data in and out of Convex | Convex Developer Hub

Streaming Data in and out of Convex

jamwt•9mo ago

fivetran to clickhouse cloud or motherduck are probably a pretty good lightweight options our future OLAP offering is likely to be built on an embedded version of duckdb

Adam HarrisOP•9mo ago

Thanks very much for the detailed explanation. Yes it’s option 2 that we’re trying to figure out and from reading around other posts on the discord we were thinking we might need to push the data to somewhere like ClickHouse, but it’s great to have the confirmation here - thanks again

jamwt•9mo ago

no problem. we'll take a note to reach back out when we ship the embedded analytics engine. I think it's going to be a cleaner solution to a lot of use cases like this

Paul•5mo ago

Add another one who is interested in analytics. 😄 Right now I'm using postgres but for the past few years have been eager to replace it.
I have been thinking about splitting the requirements in two. To have a dedicated clickhouse db for all the analytics and then something else for everything else.
But then once you throw in local-first. How is this reconciled? The zero team will be doing aggregation within the sync engine. I know the convex team is now thinking of local-first and perhaps they can do a similar approach?

How to count a large set of results from a query?

Did you find this page helpful?