magicseth
magicseth2y ago

Support count and other aggregates for a dashboard?

Is there a count function I can use, so i don't have return ALL the bytes?
13 Replies
Indy
Indy2y ago
I am assuming this is a db.query result? I believe we’d explored this, but it looks like we never shipped this publicly. I’ll add it to the priority list to finish up our thinking here.
alexcole
alexcole2y ago
One (kinda crappy) workaround is to denormalize the number of documents in the table. You could have a document in a different table with a numMessages property and increment/decrement every time a new document is added or removed from the "messages" table. Obviously more work and error prone than being able to query this directly though.
magicseth
magicsethOP2y ago
@alexcole I have no problem with denormalizing, but these kinds of stats are hard to predict what you'll want to compute. Thanks @Indy
Indy
Indy2y ago
@magicseth even if we give you a count on any arbitrary query it wont be that efficient since the query with filter is still being computed by the DB. So it likely won’t buy you much, it’ll mostly be a convenience.
magicseth
magicsethOP2y ago
I can't even filter, because it errors with too many bytes will it still hit that with a count?
Indy
Indy2y ago
Ah I see. I’ll have to check with some of the folks on where the byte limit is hit and whether count would help at all. Are you just using raw filters or indexed filters? My understanding of your product is that you have some very large tables, so you will have to add an index for anything you want to query to stay within limits and then do the count. If you just want a count of the whole table, that is something we can pre calculate for you.
sujayakar
sujayakar2y ago
hey @magicseth, what type of query would you like to compute the count of? (even a rough idea of what it looks like in SQL would be helpful.) and it sounds like it's hard to predict upfront exactly which count queries you'd like to compute, yeah? and, yeah, indy's recommendation is right: since we don't support aggregates (like count) in our query language yet, we'll have to implement them in JS, and then it's important to limit the amount of data scanned with an index. the mental model here is that .filter expressions scan all of their input, so if the input table is larger than the bytes read limit, it'll throw an error. a .withIndex expression, on the other hand, only scans the matching records. understood that this is confusing -- @ian has been working on a proposal to simplify here.
magicseth
magicsethOP2y ago
I was trying to get a rolling average of user events over a week. I'm using withIndex, but was trying to go over a weeks worth of events * 7 in one call
sujayakar
sujayakar2y ago
cool. is this for something like an internal dashboard for your app? or is it a user facing part of your app? and do you need reactivity / full consistency for this rolling average?
magicseth
magicsethOP2y ago
dashboard, doesn't need full reactivity
Indy
Indy2y ago
Got it. So this is more of a large analytics query, this is not the type of workload Convex was designed for. Our answer to these workloads is to do streaming export to a purpose built DB like Snowflake, Databricks, BigQuery etc. for these sorts of queries. You can use streaming export like so: https://docs.convex.dev/database/import-export/airbyte
Using Convex with Airbyte | Convex Developer Hub
Analyze your Convex data by exporting via Airbyte.
Indy
Indy2y ago
Though we get that people may want some sort of lightweight solution for this eventually and we may come back and make this workflow easier.
ian
ian2y ago
One workaround could also be to calculate stats in batches of queries overnight with a cron, to calculate daily values. Schedule an action that executes queries paging over all documents, ordered by creation time, returning aggregates as it goes. Then your dashboard queries this roll-up table Especially if you can page since 7 days ago, it won’t get slower over time except relative to increase in usage. @magicseth I'm curious what you think about a cron that runs daily and updates a daily_event_count table or something, so that your rolling sum is efficient to look up. You'd have to backfill the older data, but that could be done in a for-loop from a node script or from the dashboard. Happy to help show examples of that if you want.

Did you find this page helpful?