erquhart•17mo ago

Aggregation stuff

Yeah, as opposed to heavy queries, which I did a ton of formerly, I'm now leaning toward heavier mutations so the queries are fast. Especially given how indexing and ordering work in convex, if I want to paginate on any kind of derived value, storing computed values is required. Just checking to make sure I'm not holding it wrong.

16 Replies

jamwt•17mo ago

guidelines I'd say: (1) Start with doing things query side; (2) if that's too slow [and the cache isn't saving you], and you're producing values derived from less than, say, a few hundred rows, you can do this mutation side; (3) if you need to depend on even more data than that, that's kind of an aggregation process, and you probably don't want that in a transactional window. you can do a batch aggregation. What's batch aggregation look like? Today, background actions using paginated queries with a final mutation to write out the aggregate; some point soon, that action will hit the SQL OLAP interface we'll likely have which runs relatively fast aggregations on top of a column store like DuckDB. still use a final mutation to write out the aggregate This actually isn't really much different than a traditional database (convex is actually a traditional database under the hood), convex just "fails fast" when you do large queries rather than failing slow by bogging down your transactional database and causing request timeouts for unrelated traffic this is consistent with optimistic concurrency control vs. pessimistic -- convex works a little harder to make sure that problematic queries stay out of your system and don't affect site-wide performance over time, especially as your codebase and traffic grows

erquhartOP•17mo ago

Thanks for the breakdown Jamie, this makes sense.

winsoroaks•17mo ago

i wish im a 100x engineer like jamie 🫡

jamwt•17mo ago

LOL

erquhartOP•17mo ago

We all get to be 10x because Convex 🚀 If it were possible to make paginated queries more complex, eg., constructed in a query function rather than from a single ctx.db.query call, that would be a game changer for this. Not sure if that's even in the realm of possibility with how Convex works under the hood (I suspect not), but thought I'd mention. Eg., foo has many bars, and I want to paginate (and order) foo based on some calculation of it's bars, being able to query both tables and return a uniform result. I guess that would require multiple paginated queries in a single query, so maybe some of what I'm imagining will be possible when that comes along.

jamwt•17mo ago

yeah. we can explore pagination more, and there are some really tricky parts to get good performance across multiple pages etc if they're not sequential in an index this is why IMO these are perfect use cases for the SQL engine running on slightly stale data

erquhartOP•17mo ago

any idea how stale

jamwt•17mo ago

you're gonna be able to write something really expressive, it's gonna run really fast on column-oriented data, and you'll just need a lot less code rather than basically writing a complex query plan by hand in JS probably less than a few seconds in the steady state case. but we haven't finalized an architecture, so I don't want to commit to anything too crazy. the key property though is, unlikely a mutation context, it's not ACID it wouldn't have consistency guarantees for Read/Update/Write cycles or anything

erquhartOP•17mo ago

for sure, I never take future talk as a commitment, just wanted your best approximation ah okay

jamwt•17mo ago

thanks! I just like to be super clear for any reader coming by that I don't want to tie the eng team's hands once they dig into the project more and discover what things really take

erquhartOP•17mo ago

the other thing that changes when we move our complexity to sql is the testing story. I don't miss the (lack of) testing story around complex sql

jamwt•17mo ago

yeah, I thing the SQL interface will be really good for "periodic expensive aggregates", and exploratory analytics in e.g. the dashboard (e.g. debugging) but ideally shouldn't be used for anything else

erquhartOP•17mo ago

okay that's good to know

jamwt•17mo ago

maybe it could be useful for lightweight integrations with BI tools and other 3rd party things that want to do SELECT on your convex data but core application logic should still be queries/mutations 99% of the time

ian•17mo ago

Tactically, the approach that's been successful for the join-based pagination ends up looking like denormalization - keeping the data needed for sorting / pagination from the bars on foo - pushing the work to the writers instead of complex reads. the SQL path I see as the slower, less frequent reads (e.g. an admin interface or periodic rollups)

erquhartOP•17mo ago

Yeah that's where I've landed, started introducing it and it works well. Helps to have a single function each for patch/insert, reduces concerns about keeping data in sync if you only have to write denormalized data in one place.

Aggregation stuff

Did you find this page helpful?