erquhart•11mo ago

Record counts for large numbers of records is non-trivial

I've handled counting needs for large numbers of records (any amount greater than I can reasonably list purely for a count, so over 500 maybe) by keeping count fields and incrementing/decrementing at write time. I'm now noticing failures due to that number changing too much during batch mutations, so I'm having to rework to add special handling for those cases. Which is making me realize there are other such cases in my codebase that are probably just a chance away from a random failure. I'm thinking through other novel approaches, such as updating count daily or on certain events, by paginating through the records and updating a given count field as a whole. But that's a lot of data moving out of the database just to get a count. I know a number of questions have come around getting counts in this support forum. Are there better solutions on the way? This is one of those few problems that are actually harder with Convex than with a traditional db, would be nice to eventually check it off the list. For now I'm moving toward avoiding the need for counts altogether, but that's not a fun way to steer product decisions. Thoughts/suggestions appreciated!

16 Replies

erquhartOP•11mo ago

One platform solution would be to allow querying records and returning a count without the data, which should be somewhat faster, and gets rid of much of the cost factor.

lee•11mo ago

Can you describe the kinds of failures you're seeing in the batch mutations? I can imagine two ways this could surface. one is you have OCCs from all mutations touching the same document, and another is you have parallel functions that cause races from incrementing the value

erquhartOP•11mo ago

Yep it's OCC

Uncaught Error: {"code":"OptimisticConcurrencyControlFailure","message":"Documents read from or written to the \"budgets\" table changed while this mutation was being run and on every subsequent retry. A call to \"scheduled_job_mutation_success\" change...

Uncaught Error: {"code":"OptimisticConcurrencyControlFailure","message":"Documents read from or written to the \"budgets\" table changed while this mutation was being run and on every subsequent retry. A call to \"scheduled_job_mutation_success\" change...

Was debugging an issue with Plaid and some missing data, found this in Sentry from a few days ago, likely culprit

lee•11mo ago

Interesting. Counts may be possible to get more efficiently, but I have to think about it more. I'm working on a library https://github.com/ldanilek/aggregate that gets handles the denormalization for you, but I think it would still have OCC issues

GitHub

GitHub - ldanilek/aggregate

Contribute to ldanilek/aggregate development by creating an account on GitHub.

erquhartOP•11mo ago

The OCC issue in my case was incrementing/decrementing from each of a lot of parallel writes, would this be doing something along those lines?

lee•11mo ago

Yes this would still store the global count in a single document, so you have the same OCC issue. But if the problem is the count getting out of sync because of a Promise.all in a single mutation, that would be fixed by this library

erquhartOP•11mo ago

Gotcha I do suspect there are parallelization sync issues in a lot of places in my backend that I just haven't run up against at low scale, so interested to see where this goes. But yeah, would love a first class count query.

lee•11mo ago

There may be modifications to this library which does some of the aggregation at read time, which would result in fewer OCCs.

ian•11mo ago

There are eventually-consistent approaches, like appending deltas to a log that is periodically summed up by a cron or something. similar to your pagination idea.

Michal Srb•11mo ago

I was gonna ask as you mentioned that this is harder in Convex, how would you tackle the same problem with a traditional database: Would you count at read time? Also would be helpful to understand the concrete user experience example (what are we counting, why does the user need to know it).

erquhartOP•11mo ago

I just meant compared to a SQL COUNT. The use case is not load bearing, it's to give the user an approximation of remaining time, along with records touched/remaining for a long process that involves paginating through a bunch of their data on the server.

Michal Srb•11mo ago

That COUNT would presumably be pretty slow. Definitely a case where the typical Convex way of ensuring snappy reads from the DB is not what you need (and you want some eventually-consistent approach as Ian noted).

abdullahislam_•11mo ago

This is exactly what I'm looking for - and I love the "convex approach" to this.

erquhartOP•11mo ago

Totally good with eventually consistent for count cases, agreed. But you can do an indexed query count in postgres (and presumably others) and have it be maybe a couple seconds for the number of records I'm working with (~50k table, ~10k in the index)

ian•11mo ago

If you aren’t reading them all that frequently you could make a bunch of paginated requests that just count, and sum them up client-side. Only one page would get updated and other pages would be cached so only the first calculation would be slow, and it’d stay up to date transactionally It feels like overkill but I wonder how performant it’d be

erquhartOP•11mo ago

Hmm that's worth trying, I'll circle back when I do

Record counts for large numbers of records is non-trivial

Did you find this page helpful?