erquhart
erquhart•14mo ago

Aggregation stuff

Yeah, as opposed to heavy queries, which I did a ton of formerly, I'm now leaning toward heavier mutations so the queries are fast. Especially given how indexing and ordering work in convex, if I want to paginate on any kind of derived value, storing computed values is required. Just checking to make sure I'm not holding it wrong.
16 Replies
jamwt
jamwt•14mo ago
guidelines I'd say: (1) Start with doing things query side; (2) if that's too slow [and the cache isn't saving you], and you're producing values derived from less than, say, a few hundred rows, you can do this mutation side; (3) if you need to depend on even more data than that, that's kind of an aggregation process, and you probably don't want that in a transactional window. you can do a batch aggregation. What's batch aggregation look like? Today, background actions using paginated queries with a final mutation to write out the aggregate; some point soon, that action will hit the SQL OLAP interface we'll likely have which runs relatively fast aggregations on top of a column store like DuckDB. still use a final mutation to write out the aggregate This actually isn't really much different than a traditional database (convex is actually a traditional database under the hood), convex just "fails fast" when you do large queries rather than failing slow by bogging down your transactional database and causing request timeouts for unrelated traffic this is consistent with optimistic concurrency control vs. pessimistic -- convex works a little harder to make sure that problematic queries stay out of your system and don't affect site-wide performance over time, especially as your codebase and traffic grows
erquhart
erquhartOP•14mo ago
Thanks for the breakdown Jamie, this makes sense.
winsoroaks
winsoroaks•14mo ago
i wish im a 100x engineer like jamie 🫡
jamwt
jamwt•14mo ago
LOL
erquhart
erquhartOP•14mo ago
We all get to be 10x because Convex 🚀 If it were possible to make paginated queries more complex, eg., constructed in a query function rather than from a single ctx.db.query call, that would be a game changer for this. Not sure if that's even in the realm of possibility with how Convex works under the hood (I suspect not), but thought I'd mention. Eg., foo has many bars, and I want to paginate (and order) foo based on some calculation of it's bars, being able to query both tables and return a uniform result. I guess that would require multiple paginated queries in a single query, so maybe some of what I'm imagining will be possible when that comes along.
jamwt
jamwt•14mo ago
yeah. we can explore pagination more, and there are some really tricky parts to get good performance across multiple pages etc if they're not sequential in an index this is why IMO these are perfect use cases for the SQL engine running on slightly stale data
erquhart
erquhartOP•14mo ago
any idea how stale
jamwt
jamwt•14mo ago
you're gonna be able to write something really expressive, it's gonna run really fast on column-oriented data, and you'll just need a lot less code rather than basically writing a complex query plan by hand in JS probably less than a few seconds in the steady state case. but we haven't finalized an architecture, so I don't want to commit to anything too crazy. the key property though is, unlikely a mutation context, it's not ACID it wouldn't have consistency guarantees for Read/Update/Write cycles or anything
erquhart
erquhartOP•14mo ago
for sure, I never take future talk as a commitment, just wanted your best approximation ah okay
jamwt
jamwt•14mo ago
thanks! I just like to be super clear for any reader coming by that I don't want to tie the eng team's hands once they dig into the project more and discover what things really take
erquhart
erquhartOP•14mo ago
the other thing that changes when we move our complexity to sql is the testing story. I don't miss the (lack of) testing story around complex sql
jamwt
jamwt•14mo ago
yeah, I thing the SQL interface will be really good for "periodic expensive aggregates", and exploratory analytics in e.g. the dashboard (e.g. debugging) but ideally shouldn't be used for anything else
erquhart
erquhartOP•14mo ago
okay that's good to know
jamwt
jamwt•14mo ago
maybe it could be useful for lightweight integrations with BI tools and other 3rd party things that want to do SELECT on your convex data but core application logic should still be queries/mutations 99% of the time
ian
ian•14mo ago
Tactically, the approach that's been successful for the join-based pagination ends up looking like denormalization - keeping the data needed for sorting / pagination from the bars on foo - pushing the work to the writers instead of complex reads. the SQL path I see as the slower, less frequent reads (e.g. an admin interface or periodic rollups)
erquhart
erquhartOP•14mo ago
Yeah that's where I've landed, started introducing it and it works well. Helps to have a single function each for patch/insert, reduces concerns about keeping data in sync if you only have to write denormalized data in one place.

Did you find this page helpful?