Search order use case
Wanted to share my search use case, which requires ordering, just to add to the conversation. And to learn of any workarounds you might recommend.
My primary search is across financial transactions, so results still have to be ordered by date. It really needs to be paginated.
My best current workaround is not paginating and instead using
take()
. This has a few issues:
- Most search terms (like "Apple" or "Walmart") turn up lots of results. I'm searching on a combined field that includes subtly different values, so I don't expect perfect ties in relevance for _creationTime
to be the tiebreaker.
- Because of this, I could be missing the most recent and obvious matches to a query.
- A higher take()
number sort of improves chances of matching the right (recent) documents, but there's no guarantee
So big vote from me on ordering for search results.
Any thoughts on how to better approach this for now?12 Replies
Actually - just checked this more closely and it seems to be returning the oldest records first in case of a tie.
So if I search for "Apple", use
take()
to limit results, and sort the result by date, the newest result is from June, and it misses the ones from a few days ago.
Going to refactor to use a regular index and build pages in the client with filtering for now.
I guess the model I'm looking for here is effectively search as a filter. I want indexing to work the same, I don't want to order to change at all, but I want to exclude documents that don't meet search criteria, paginated.I would say that you're running into the fact that search is a very open-ended problem. Google search is also "search", but clearly takes into account more than just textual relevancy.
So the more you want form search, the harder it will get.
To the specific use case you have in mind, I wonder if you could add less-granular time as a field to your documents, and then run multiple queries, filtering for that time field.
So for example you could add a "date" (24h granularity). And then search first today, then yesterday and so on.
Ofc that's a lot of queries, so you could use multiple granularities, week, then month. So then you could search today, yesterday, this week, this month, and everything, deduping.
You'd have to implement the pagination manually, and it might be tricky as the search results could shift around, but since it's search an approximate result might be good enough for you.
Yeah, fair. Not trying to move past textual relevance - what Convex has is exactly what I want, just an ordering issue. The granularity approach could work, I'll look at that 👍
@sujayakar I think specifying the ordering for conflicts, such that newest results show up first, is a valid feature request?
yep, makes sense!
🙌 🙌
Just to clarify - in this case it would need to be ordered even without conflicts. I'm seeing this as a basic pattern for apps that work in dates. If you search on your bank app, or if you search in any email app, results aren't by relevance, they're by date/time. Relevance basically has no representation, either something is included in the results or it isn't. But ordering is the same as it is without a search term.
One approach could be ordering the same we do with regular indexes, so search relevance is the main ordering field, being first, followed by filter fields in order as ties occur, and
order()
is supported as well. From there, we'd just need a way to allow relevance to be optionally ignored in the ordering.mm yeah, I can see how this doesn't quite fit into our system yet.
can you say more about ignoring relevance in the ordering? would the ideal behavior be something like...
1. find the K (say, 1024) most relevant transactions, filtering out transactions that don't match the query terms at all,
2. reorder those 1024 terms by date descending, and then
3. paginate through those results in date order?
or, going the other direction, would it be something like...
1. find the most relevant transactions from some fixed time period in the past (say, the last week)
2. resort those results by date descending, and then
3. paginate through these results in date order?
I think the only difference in those two scenarios is limiting the results to a static number vs limiting to a range. I would hope for regular pagination that doesn't require a range, so the user can keep loading more results until they get to the end of the index.
Gmail is a great example (or any email app) - if you go and search in your inbox, you'll see relevant results, but they will not be ordered by relevance. You can keep scrolling through relevant results until they're exhausted.
Gmail doesn't even offer a sort option, even in advanced search. It's only ever by date/time received
Curious, are you all using elastic behind the scenes or something else? Totally cool if you'd rather not share.
@sujayakar I think from @erquhart's perspective a search query splits the table into two sets: Included (has any overlap) and not included (has no overlap). Then the ask is to return the "included" set sorted by _creationTime (or some other field), instead of by search relevance. This made even more sense before we added fuzzy search, and still makes sense if there's a relevance cut-off.
(we're not using Elastic)
100%, ideally ordering would work exactly as it does in a regular index, by all included fields from left to right w/ creation time as final tiebreaker.
ah perfect, that makes sense!
yeah, our implementation currently only supports sorting on relevance (w/other fields as tiebreakers) for the result from
withSearchIndex
, but it totally makes sense to want to sort on other fields. will write this one down on our search workstream and keep you posted!Thank you!