Support count and other aggregates for a dashboard?
Is there a count function I can use, so i don't have return ALL the bytes?
13 Replies
I am assuming this is a db.query result? I believe we’d explored this, but it looks like we never shipped this publicly. I’ll add it to the priority list to finish up our thinking here.
One (kinda crappy) workaround is to denormalize the number of documents in the table. You could have a document in a different table with a
numMessages
property and increment/decrement every time a new document is added or removed from the "messages"
table. Obviously more work and error prone than being able to query this directly though.@alexcole I have no problem with denormalizing, but these kinds of stats are hard to predict what you'll want to compute. Thanks @Indy
@magicseth even if we give you a count on any arbitrary query it wont be that efficient since the query with filter is still being computed by the DB. So it likely won’t buy you much, it’ll mostly be a convenience.
I can't even filter, because it errors with too many bytes
will it still hit that with a count?
Ah I see. I’ll have to check with some of the folks on where the byte limit is hit and whether count would help at all.
Are you just using raw filters or indexed filters? My understanding of your product is that you have some very large tables, so you will have to add an index for anything you want to query to stay within limits and then do the count.
If you just want a count of the whole table, that is something we can pre calculate for you.
hey @magicseth, what type of query would you like to compute the count of? (even a rough idea of what it looks like in SQL would be helpful.) and it sounds like it's hard to predict upfront exactly which
count
queries you'd like to compute, yeah?
and, yeah, indy's recommendation is right: since we don't support aggregates (like count
) in our query language yet, we'll have to implement them in JS, and then it's important to limit the amount of data scanned with an index.
the mental model here is that .filter
expressions scan all of their input, so if the input table is larger than the bytes read limit, it'll throw an error. a .withIndex
expression, on the other hand, only scans the matching records.
understood that this is confusing -- @ian has been working on a proposal to simplify here.I was trying to get a rolling average of user events over a week. I'm using withIndex, but was trying to go over a weeks worth of events * 7 in one call
cool. is this for something like an internal dashboard for your app? or is it a user facing part of your app? and do you need reactivity / full consistency for this rolling average?
dashboard, doesn't need full reactivity
Got it. So this is more of a large analytics query, this is not the type of workload Convex was designed for. Our answer to these workloads is to do streaming export to a purpose built DB like Snowflake, Databricks, BigQuery etc. for these sorts of queries. You can use streaming export like so: https://docs.convex.dev/database/import-export/airbyte
Using Convex with Airbyte | Convex Developer Hub
Analyze your Convex data by exporting via Airbyte.
Though we get that people may want some sort of lightweight solution for this eventually and we may come back and make this workflow easier.
One workaround could also be to calculate stats in batches of queries overnight with a cron, to calculate daily values. Schedule an action that executes queries paging over all documents, ordered by creation time, returning aggregates as it goes. Then your dashboard queries this roll-up table
Especially if you can page since 7 days ago, it won’t get slower over time except relative to increase in usage.
@magicseth I'm curious what you think about a cron that runs daily and updates a daily_event_count table or something, so that your rolling sum is efficient to look up. You'd have to backfill the older data, but that could be done in a for-loop from a node script or from the dashboard. Happy to help show examples of that if you want.