erquhart
erquhart14mo ago

Search order use case

Wanted to share my search use case, which requires ordering, just to add to the conversation. And to learn of any workarounds you might recommend. My primary search is across financial transactions, so results still have to be ordered by date. It really needs to be paginated. My best current workaround is not paginating and instead using take(). This has a few issues: - Most search terms (like "Apple" or "Walmart") turn up lots of results. I'm searching on a combined field that includes subtly different values, so I don't expect perfect ties in relevance for _creationTime to be the tiebreaker. - Because of this, I could be missing the most recent and obvious matches to a query. - A higher take() number sort of improves chances of matching the right (recent) documents, but there's no guarantee So big vote from me on ordering for search results. Any thoughts on how to better approach this for now?
12 Replies
erquhart
erquhartOP14mo ago
Actually - just checked this more closely and it seems to be returning the oldest records first in case of a tie. So if I search for "Apple", use take() to limit results, and sort the result by date, the newest result is from June, and it misses the ones from a few days ago. Going to refactor to use a regular index and build pages in the client with filtering for now. I guess the model I'm looking for here is effectively search as a filter. I want indexing to work the same, I don't want to order to change at all, but I want to exclude documents that don't meet search criteria, paginated.
Michal Srb
Michal Srb14mo ago
I would say that you're running into the fact that search is a very open-ended problem. Google search is also "search", but clearly takes into account more than just textual relevancy. So the more you want form search, the harder it will get. To the specific use case you have in mind, I wonder if you could add less-granular time as a field to your documents, and then run multiple queries, filtering for that time field. So for example you could add a "date" (24h granularity). And then search first today, then yesterday and so on. Ofc that's a lot of queries, so you could use multiple granularities, week, then month. So then you could search today, yesterday, this week, this month, and everything, deduping. You'd have to implement the pagination manually, and it might be tricky as the search results could shift around, but since it's search an approximate result might be good enough for you.
erquhart
erquhartOP14mo ago
Yeah, fair. Not trying to move past textual relevance - what Convex has is exactly what I want, just an ordering issue. The granularity approach could work, I'll look at that 👍
Michal Srb
Michal Srb14mo ago
@sujayakar I think specifying the ordering for conflicts, such that newest results show up first, is a valid feature request?
sujayakar
sujayakar14mo ago
yep, makes sense!
erquhart
erquhartOP14mo ago
🙌 🙌 Just to clarify - in this case it would need to be ordered even without conflicts. I'm seeing this as a basic pattern for apps that work in dates. If you search on your bank app, or if you search in any email app, results aren't by relevance, they're by date/time. Relevance basically has no representation, either something is included in the results or it isn't. But ordering is the same as it is without a search term. One approach could be ordering the same we do with regular indexes, so search relevance is the main ordering field, being first, followed by filter fields in order as ties occur, and order() is supported as well. From there, we'd just need a way to allow relevance to be optionally ignored in the ordering.
sujayakar
sujayakar14mo ago
mm yeah, I can see how this doesn't quite fit into our system yet. can you say more about ignoring relevance in the ordering? would the ideal behavior be something like... 1. find the K (say, 1024) most relevant transactions, filtering out transactions that don't match the query terms at all, 2. reorder those 1024 terms by date descending, and then 3. paginate through those results in date order? or, going the other direction, would it be something like... 1. find the most relevant transactions from some fixed time period in the past (say, the last week) 2. resort those results by date descending, and then 3. paginate through these results in date order?
erquhart
erquhartOP14mo ago
I think the only difference in those two scenarios is limiting the results to a static number vs limiting to a range. I would hope for regular pagination that doesn't require a range, so the user can keep loading more results until they get to the end of the index. Gmail is a great example (or any email app) - if you go and search in your inbox, you'll see relevant results, but they will not be ordered by relevance. You can keep scrolling through relevant results until they're exhausted. Gmail doesn't even offer a sort option, even in advanced search. It's only ever by date/time received Curious, are you all using elastic behind the scenes or something else? Totally cool if you'd rather not share.
Michal Srb
Michal Srb14mo ago
@sujayakar I think from @erquhart's perspective a search query splits the table into two sets: Included (has any overlap) and not included (has no overlap). Then the ask is to return the "included" set sorted by _creationTime (or some other field), instead of by search relevance. This made even more sense before we added fuzzy search, and still makes sense if there's a relevance cut-off. (we're not using Elastic)
erquhart
erquhartOP14mo ago
100%, ideally ordering would work exactly as it does in a regular index, by all included fields from left to right w/ creation time as final tiebreaker.
sujayakar
sujayakar14mo ago
ah perfect, that makes sense! yeah, our implementation currently only supports sorting on relevance (w/other fields as tiebreakers) for the result from withSearchIndex, but it totally makes sense to want to sort on other fields. will write this one down on our search workstream and keep you posted!
erquhart
erquhartOP14mo ago
Thank you!

Did you find this page helpful?