Var&•2y ago

Get the count of the query without using collect()

Hi, I'm curious to learn if there is a way to get the total count of the particular query, not paginated, just regular query. I can't find anything in the docs nor here in the conversations. Thank you!

16 Replies

Jamal•2y ago

What do you mean by count? Do you mean the sql COUNT? Or something else

erquhart•2y ago

There isn’t currently, you have to collect and count. If you need this, I’d recommend storing any counts explicitly in a table, use a migration to get initial count(s) and increment/decrement from there. Not great, but possible.

Var&OP•2y ago

@CodingWithJamal yes, total count of the query. @erquhart thank you, i'm doing that which is not great. As a matter of fact Firebase added this capability not long ago, as the demand was very high.

erquhart•2y ago

cc/ @ian (we need a way to at-mention convex team btw) @Var& curious, can you share a little more on what your use case is?

ian•2y ago

@Var& i hear you- there has been some interest in both full table counts and query count. one related thread with some thinking: https://discord.com/channels/1019350475847499849/1166106601543962715 One thing that is worth considering is that when you do a count of a query in SQL, unless it’s a full table count or something it specifically keeps statistics on, it has to scan all the related rows to get the count. So it’s just a more bandwidth-efficient version of .collect().length. However if you have 10k+ documents in the query, sql will just take longer, where Convex will currently refuse to read that much. That’s why Firebase didn’t have it for so long, and why their docs say

scale proportionally to the number of index entries scanned. Latency increases with the number of items in the aggregation.

for firebase, it’s especially important since all of the data would otherwise be sent all the way to the client. In Convex, your data is only going a short hop to your function. Denormalizing as a pattern is the efficient solution, and something I want to make some user-space helpers for, so it feels less manual & safer to get regularly used statistics. I know the answer isn’t exactly what you want, but I’m curious how workable it sounds? What is your used case so we can better evaluate alternatives?

Dima Utkin•2y ago

@ian firestore got count/sum/avg on client not that long ago, and it doesn't fetch all the document, of course 🙂 i was hoping to see triggers in Convex(which could be, theoretically, used for these kind of problems), and was promised they're coming, but it seems like it's not a high priority

Jamal•2y ago

does convex not store the table count anyway in its table metadata since when you add a row it has to add a new one to the index? So should this not be fetched from there is does it not work that way.

ian•2y ago

We could expose a total count, but not a count for a specific query Yes firestore doesn't send all documents to the client, but it does have to scan them all, assuming based on their docs:

scale proportionally to the number of index entries scanned. Latency increases with the number of items in the aggregation.

So it's similar: your backend (which is close to the DB) scans the documents and sends the client the count. It just currently will incur more function <-> DB bandwidth, whereas firestore absorbs that cost for you. I think we're agreeing, but lmk if I missed something I actually have been working on some code to make user-space triggers easier to write - hopefully something in the next week or two. The full-on trigger feature is dependent on current Component workstream which is currently underway

Dima Utkin•2y ago

I'm really interested in a normal db-driven triggers, imho, they're the best feature of firestore, basically for any event-driven flows it's a must. I'll wait, I'm patient and i know how hard these things are to implement 🙂

erquhart•2y ago

@Dima Utkin for what use case are you seeing the current api's not supporting triggers well? I'm doing event triggers pretty minimally, but it's working so far. Curious what your experience has been.

Dima Utkin•2y ago

Anything that remotely resembles CQRS pattern. When you have a flow of commands and events that trigger each other. Imagine you have a command like "create patient", after it's executed and an event "patient created" was emitted and persisted into a db, you might want to execute multiple different commands related to that event, establishing some defaults, setting some cron jobs, etc. You definitely can do that programmatically, but alternatively, db can become a source of truth and context for all the related actions you want to execute, following your app's business logic

erquhart•2y ago

Gotcha, makes sense. Curious to see what Ian has cooking for user space triggers.

ian•2y ago

My first pass is akin to the RLS pattern - a db wrapper that executes actions on db writes. The API I'm thinking of would let you register functions like

{ 
  tableName: async (ctx, { oldDoc, newDoc })
  // insert: oldDoc is null
  // delete: newDoc is null
  // update: both are set
  otherTable: ... 
}

{ 
  tableName: async (ctx, { oldDoc, newDoc })
  // insert: oldDoc is null
  // delete: newDoc is null
  // update: both are set
  otherTable: ... 
}

and expose a dbWithoutTriggers in ctx if you don't want to cascade them. From there you could build thin layers to: - schedule a mutation/ action for each doc (to not bloat the transaction if it doesn't need to be transactional) - configure denormalization rules, including statistics - assign auto-incrementing IDs - do write-time zod validation The shortcomings are: - You'd have to enforce that all writes use the wrapped DB. Something like a lint rule that prohibits bare mutation / internalMutation definitions. - Data inserted from the CLI or dashboard or if you stream in via fivetran wouldn't trigger the updates. - Doing a backfill is complicated and out of scope for a first version - you'd need to process existing documents then atomically switch to new events, while also having captured changes to older documents. Needs its own metadata table. Haven't given it much thought. - Not v1, but you could ask for build-time cycle detection by configuring which tables you have access to write in each . Maybe extend to allow you to capture specific fields changed in the table to enable finer-grain cycle detection at build or run time. The primitive is just a db wrapper, from there it'd hopefully be easy to make your own behavior. To insert rules at a deeper layer that capture dashboard/CLI edits would require a deeper integration. None of this is a promise - just a sketch of what I'm thinking - wdyt?

Var&OP•2y ago

Thank you very much everyone for your thoughts and ideas. @erquhart the use cases of getting query count can be limitless, for example, I'd like to know the count of businesses that are open 24/7 and have pizza. I don't need the data of qualifying records because 1) I may want to paginate 2) UI doesn't need to show all the records, rather just the total count for user is enough or 3) it may cost me to read each record (this is the case in FB, not sure about convex) so doing something like this would be ideal

...
.query("businesses")
 .filter((q) => q.eq(q.field("isOpen247"), args.isOpen247))
 .filter((q) => q.eq(q.field("pizzeria"), args.pizzeria))
 .count(); <<---

...
.query("businesses")
 .filter((q) => q.eq(q.field("isOpen247"), args.isOpen247))
 .filter((q) => q.eq(q.field("pizzeria"), args.pizzeria))
 .count(); <<---

from the thread, I'm getting this feeling that it's either not possible atm or doesn't have priority, hopefully in future then?

ian•2y ago

If you have fewer than thousands of records I’d try implementing with collect and see if it ends up being an issue. Queries that don’t change data are cached, so all users will see the same results and you only pay db bandwidth once. So you could make a query specifically for this, if it is <8k records But yes it’s not currently implemented and not in active development but we’re thinking about it

ian•2y ago

You'd have to enforce that all writes use the wrapped DB. Something like a lint rule that prohibits bare mutation / internalMutation definitions.

This can be achieved like this: https://stack.convex.dev/eslint-setup

Set up ESLint for best practices

ESLint is a powerful tool that goes beyond simply type checking. We go over how to set it up and implement some basic workflows with the

Get the count of the query without using collect()

Did you find this page helpful?