Order by weighted sum (or weighted average) of two indexes
Hi 🙂 I was implementing a feed item sorting algorithm and wondering if it is possible to order rows by weighted sum or somehow dynamically scored standard of two indexes.
For example, weighting recency (_createdAt) and popularity (numLikes, etc) together is very common for sorting a trending item.
One idea I've thought myself is to pre-calculate weights for ordering table with cron job or every popularity event invocation but it felt sub-optimal.
3 Replies
The only way to can combine indexes at query time is "sort first by indexed field 1, then (only within ties on field 1) sort by indexed field 2."
If there's a consistent weighted average of two existing fields you want then I'd denormalize a bit: every time you insert or modify one of these records, also write a third field that combines the first two however you like (and write a helper function to do this for you). But this doesn't work well for recency (
now - _createdAt
) because what "now" is is always changing — so just like you suggest, you need to update these scores with a cron. There might be a math trick here to make it necessary to update less often but I don't know it. Something like (_createdAt - minCreatedAt) + popularityScore * 10000000
is broken because it weights recency more heavily over time, but I bet folks have characterized the tradeoffs and found something useful.
Instead of this the common approach here (you may be familiar with this) that I saw at Dropbox creating feeds was querying candidates based on properties that could be indexed or pre-calculated, then re-sorting and filtering those candidates in memory. This pattern works well in Convex: grab say 100 items based on popularity or search term and then apply the more complex dynamic ranking to rerank them.@daun in general, standard database indexes are just btrees, so they order exactly one way. so using some of tom's tricks, you can either do this in memory after the "fast" index-based fetch of a smaller batch of records, or you can materialize these scores in an index periodically or whatever if it's a lot of data and you want global "accuracy" about rankings
For AI Town where we need to get a weighted ranking of recency, importance, and semantic relevance (vector search), we fetch the most relevant results, and maybe also a set of recent results, then do the prioritization in memory. We do this to fetch each character's memories to prioritize them for which to use as context for what they say next, relevant to the current conversation