Record counts for large numbers of records is non-trivial
I've handled counting needs for large numbers of records (any amount greater than I can reasonably list purely for a count, so over 500 maybe) by keeping count fields and incrementing/decrementing at write time.
I'm now noticing failures due to that number changing too much during batch mutations, so I'm having to rework to add special handling for those cases. Which is making me realize there are other such cases in my codebase that are probably just a chance away from a random failure.
I'm thinking through other novel approaches, such as updating count daily or on certain events, by paginating through the records and updating a given count field as a whole. But that's a lot of data moving out of the database just to get a count.
I know a number of questions have come around getting counts in this support forum. Are there better solutions on the way? This is one of those few problems that are actually harder with Convex than with a traditional db, would be nice to eventually check it off the list.
For now I'm moving toward avoiding the need for counts altogether, but that's not a fun way to steer product decisions. Thoughts/suggestions appreciated!
16 Replies
One platform solution would be to allow querying records and returning a count without the data, which should be somewhat faster, and gets rid of much of the cost factor.
Can you describe the kinds of failures you're seeing in the batch mutations? I can imagine two ways this could surface. one is you have OCCs from all mutations touching the same document, and another is you have parallel functions that cause races from incrementing the value
Yep it's OCC
Was debugging an issue with Plaid and some missing data, found this in Sentry from a few days ago, likely culprit
Interesting. Counts may be possible to get more efficiently, but I have to think about it more. I'm working on a library https://github.com/ldanilek/aggregate that gets handles the denormalization for you, but I think it would still have OCC issues
GitHub
GitHub - ldanilek/aggregate
Contribute to ldanilek/aggregate development by creating an account on GitHub.
The OCC issue in my case was incrementing/decrementing from each of a lot of parallel writes, would this be doing something along those lines?
Yes this would still store the global count in a single document, so you have the same OCC issue. But if the problem is the count getting out of sync because of a Promise.all in a single mutation, that would be fixed by this library
Gotcha
I do suspect there are parallelization sync issues in a lot of places in my backend that I just haven't run up against at low scale, so interested to see where this goes. But yeah, would love a first class count query.
There may be modifications to this library which does some of the aggregation at read time, which would result in fewer OCCs.
There are eventually-consistent approaches, like appending deltas to a log that is periodically summed up by a cron or something. similar to your pagination idea.
I was gonna ask as you mentioned that this is harder in Convex, how would you tackle the same problem with a traditional database: Would you count at read time?
Also would be helpful to understand the concrete user experience example (what are we counting, why does the user need to know it).
I just meant compared to a SQL COUNT. The use case is not load bearing, it's to give the user an approximation of remaining time, along with records touched/remaining for a long process that involves paginating through a bunch of their data on the server.
That COUNT would presumably be pretty slow. Definitely a case where the typical Convex way of ensuring snappy reads from the DB is not what you need (and you want some eventually-consistent approach as Ian noted).
This is exactly what I'm looking for - and I love the "convex approach" to this.
Totally good with eventually consistent for count cases, agreed.
But you can do an indexed query count in postgres (and presumably others) and have it be maybe a couple seconds for the number of records I'm working with (~50k table, ~10k in the index)
If you aren’t reading them all that frequently you could make a bunch of paginated requests that just count, and sum them up client-side. Only one page would get updated and other pages would be cached so only the first calculation would be slow, and it’d stay up to date transactionally
It feels like overkill but I wonder how performant it’d be
Hmm that's worth trying, I'll circle back when I do