Aggregation stuff
Yeah, as opposed to heavy queries, which I did a ton of formerly, I'm now leaning toward heavier mutations so the queries are fast. Especially given how indexing and ordering work in convex, if I want to paginate on any kind of derived value, storing computed values is required.
Just checking to make sure I'm not holding it wrong.
16 Replies
guidelines I'd say: (1) Start with doing things query side; (2) if that's too slow [and the cache isn't saving you], and you're producing values derived from less than, say, a few hundred rows, you can do this mutation side; (3) if you need to depend on even more data than that, that's kind of an aggregation process, and you probably don't want that in a transactional window. you can do a batch aggregation. What's batch aggregation look like? Today, background actions using paginated queries with a final mutation to write out the aggregate; some point soon, that action will hit the SQL OLAP interface we'll likely have which runs relatively fast aggregations on top of a column store like DuckDB. still use a final mutation to write out the aggregate
This actually isn't really much different than a traditional database (convex is actually a traditional database under the hood), convex just "fails fast" when you do large queries rather than failing slow by bogging down your transactional database and causing request timeouts for unrelated traffic
this is consistent with optimistic concurrency control vs. pessimistic -- convex works a little harder to make sure that problematic queries stay out of your system and don't affect site-wide performance over time, especially as your codebase and traffic grows
Thanks for the breakdown Jamie, this makes sense.
i wish im a 100x engineer like jamie 🫡
LOL
We all get to be 10x because Convex 🚀
If it were possible to make paginated queries more complex, eg., constructed in a query function rather than from a single
ctx.db.query
call, that would be a game changer for this. Not sure if that's even in the realm of possibility with how Convex works under the hood (I suspect not), but thought I'd mention.
Eg., foo has many bars, and I want to paginate (and order) foo based on some calculation of it's bars, being able to query both tables and return a uniform result. I guess that would require multiple paginated queries in a single query, so maybe some of what I'm imagining will be possible when that comes along.yeah. we can explore pagination more, and there are some really tricky parts to get good performance across multiple pages etc if they're not sequential in an index
this is why IMO these are perfect use cases for the SQL engine running on slightly stale data
any idea how stale
you're gonna be able to write something really expressive, it's gonna run really fast on column-oriented data, and you'll just need a lot less code rather than basically writing a complex query plan by hand in JS
probably less than a few seconds in the steady state case. but we haven't finalized an architecture, so I don't want to commit to anything too crazy. the key property though is, unlikely a
mutation
context, it's not ACID
it wouldn't have consistency guarantees for Read/Update/Write cycles or anythingfor sure, I never take future talk as a commitment, just wanted your best approximation
ah okay
thanks! I just like to be super clear for any reader coming by that I don't want to tie the eng team's hands once they dig into the project more and discover what things really take
the other thing that changes when we move our complexity to sql is the testing story. I don't miss the (lack of) testing story around complex sql
yeah, I thing the SQL interface will be really good for "periodic expensive aggregates", and exploratory analytics in e.g. the dashboard (e.g. debugging)
but ideally shouldn't be used for anything else
okay that's good to know
maybe it could be useful for lightweight integrations with BI tools and other 3rd party things that want to do SELECT on your convex data
but core application logic should still be queries/mutations 99% of the time
Tactically, the approach that's been successful for the join-based pagination ends up looking like denormalization - keeping the data needed for sorting / pagination from the
bars
on foo
- pushing the work to the writers instead of complex reads. the SQL path I see as the slower, less frequent reads (e.g. an admin interface or periodic rollups)Yeah that's where I've landed, started introducing it and it works well. Helps to have a single function each for patch/insert, reduces concerns about keeping data in sync if you only have to write denormalized data in one place.