Allan
Allan3w ago

Using custom query with typescript filtering and pagination. Performance and limitation issues.

Hello together, I have a question regarding quering with streams and .filterWith. We have this table structure
report: defineTable({
title: v.optional(v.string()),
content: v.string(),
content_summarized: v.string(),
link: v.optional(v.string()),
updated_at: v.number(), // UTC timestamp in miliseconds
date: v.number(), // UTC timestamp in miliseconds,
tags: v.optional(v.array(v.string())), // TODO linked table
})
.index("title", ["title"])
.index("content", ["content"])
.index("content_summarized", ["content_summarized"])
report: defineTable({
title: v.optional(v.string()),
content: v.string(),
content_summarized: v.string(),
link: v.optional(v.string()),
updated_at: v.number(), // UTC timestamp in miliseconds
date: v.number(), // UTC timestamp in miliseconds,
tags: v.optional(v.array(v.string())), // TODO linked table
})
.index("title", ["title"])
.index("content", ["content"])
.index("content_summarized", ["content_summarized"])
We want to perform a search by title and content` query, which does a include filtering based on JavaScript. We use this mutation:
export const searchByTitleSimple = query({
args: { paginationOpts: paginationOptsValidator, searchTermTitle: v.string(), searchTermContent: v.string() },
handler: async (ctx, args) => {
let reports_filtered = stream(ctx.db, schema)
.query("report")
.withIndex("title")
.filterWith(async (report) => !!report.title?.toLocaleLowerCase()?.includes(args.searchTermTitle.toLocaleLowerCase()))
.filterWith(async (report) => report.content.toLocaleLowerCase()?.includes(args.searchTermContent.toLocaleLowerCase()))


return await reports_filtered.paginate(args.paginationOpts);
},
});
export const searchByTitleSimple = query({
args: { paginationOpts: paginationOptsValidator, searchTermTitle: v.string(), searchTermContent: v.string() },
handler: async (ctx, args) => {
let reports_filtered = stream(ctx.db, schema)
.query("report")
.withIndex("title")
.filterWith(async (report) => !!report.title?.toLocaleLowerCase()?.includes(args.searchTermTitle.toLocaleLowerCase()))
.filterWith(async (report) => report.content.toLocaleLowerCase()?.includes(args.searchTermContent.toLocaleLowerCase()))


return await reports_filtered.paginate(args.paginationOpts);
},
});
I set the index "title" but there are problems which I need to figure out and would like to ask you. First, Convex blocks if the size of report table is bigger then 16MB Second, the performance. I was thinking that there most be a way that filterings needs to be performed in batches so that the search and looping is stopped until the pagination limiting is finished. What would be the correct convex way to solve this problem? Thank you
7 Replies
Convex Bot
Convex Bot3w ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
ian
ian3w ago
To make this more efficient, I would suggest having a field title_content_lowercase where you store ${title.toLocaleLowerCase()} ${content.toLocaleLowerCase()} and then define a text search over that field and use the lowercase of the query as the search term. In fact you likely don't need to do case changes, as I believe it's already case-insensitive Doing the filtering the way you're doing it is scanning every row of the database unfortunately.
Allan
AllanOP3w ago
Hi @ian thanks. Ok cool. So basically my approch is not wrong, right? With your proposal it could be done better. I saw, that when I use a search index, then I only can use one "withSearchIndex" statement per query, so when I want to have a search in "title" and "content" (because of multisearch and I will have up to 30 filters on my website) then I need to merge these query streams together, right? Would that be the convax-way?
ian
ian3w ago
Your approach is not technically wrong, provided you are ok paginating over the whole table from the client. FYI .index("title", ["title"]) and .index("content", ["content"]) are not affecting your query at all right now - they aren't automatically used in any way by filterWith For searches you might find other approaches useful: - Move all the search terms into one field so there's one search index - Do separate queries for each and combine it on the client - Kick off multiple one-off queries (convex.query) for different results from the client - Have the search be an action that calls multiple search queries (also enables vector search) and combines the results in a clever way (e.g. https://github.com/get-convex/rag/blob/main/src/client/hybridRank.ts#L1-L6)
GitHub
rag/src/client/hybridRank.ts at main · get-convex/rag
Document search component to aid RAG. Contribute to get-convex/rag development by creating an account on GitHub.
Allan
AllanOP3w ago
Nice, I will have a look at searchIndex and the multiple search. But I would like to dig a bit deeper regarding the streaming concept, if that's ok. Considering my app is about some "contact" / "adress book" management. Having report table a bit extended now, where also zip and city as fields are included.
report: defineTable({
title: v.optional(v.string()),
content: v.string(),
content_summarized: v.string(),
zip: v.array(v.string()),
city: v.string(),
})
report: defineTable({
title: v.optional(v.string()),
content: v.string(),
content_summarized: v.string(),
zip: v.array(v.string()),
city: v.string(),
})
My query would look in SQL something like
SELECT * FROM report
WHERE title LIKE '%searchTerm%' AND
content LIKE '%searchTerm%' AND
content_summarized LIKE '%searchTerm%' AND
tags in ('searchTerm') AND
zip LIKE '%zipTermn%'
SELECT * FROM report
WHERE title LIKE '%searchTerm%' AND
content LIKE '%searchTerm%' AND
content_summarized LIKE '%searchTerm%' AND
tags in ('searchTerm') AND
zip LIKE '%zipTermn%'
My focus is here that I do a AND condition. Because the user of my app want to specify a range to reports by these conditions. I want to intentionally not use a searchIndex because for zip code and city I want to match all reports where the match is not fuzzy but exactly. (title and content can be fuzzy). My question is now, if I set up a stream query for every search term. I can merge them together
let reports1= stream(ctx.db, schema)
.query("report")
.filterWith(async (report) => !!report.title?.toLocaleLowerCase()?.includes(args.searchTermTitle.toLocaleLowerCase()))

let reports2= stream(ctx.db, schema)
.query("report")
.filterWith(async (report) => !!report.zip?.toLocaleLowerCase()?.includes(args.searchTermZip.toLocaleLowerCase()))

const reports = mergedStream([reports1, reports2], ["_creationTime"]);
let reports1= stream(ctx.db, schema)
.query("report")
.filterWith(async (report) => !!report.title?.toLocaleLowerCase()?.includes(args.searchTermTitle.toLocaleLowerCase()))

let reports2= stream(ctx.db, schema)
.query("report")
.filterWith(async (report) => !!report.zip?.toLocaleLowerCase()?.includes(args.searchTermZip.toLocaleLowerCase()))

const reports = mergedStream([reports1, reports2], ["_creationTime"]);
which represents an OR conjunction. I unfortunately did not found the representative Stream operation in the helper tools regarding a intersection, like the result are these reports which are in all of the streams. Did I was silly to found it or can you give me some hint how to solve that problem :)? Thank you (The code is more or less pseudo code as you will see, but I hope I descriped my problem well)
ian
ian3w ago
Gotcha - the equivalent to that SQL query is a table scan (likely what you'd see with describe on that query but lmk if not). You have two options: - Make a field with `${title} ${content} ${content_summarized} ${tags.join(" ")} ` and do a text search with a filter field on zip & city - Do what you did initially with a paginated query over the whole table doing a custom js function to filter. If you do that, look into the options for paginator / stream to limit how many bytes you read per pagination round. Worst case you'd try to query the whole table in the first page if there's no matches.
Allan
AllanOP3w ago
ok nice, Thank you, Ian

Did you find this page helpful?