Jordan
Jordan2w ago

filter index based on multiple possible values?

consider the following query, where we want to select all posts made by a user, with a particular status of pending. due to lots of users, and lots of posts per user, we have an index set up to index posts by ["userId", "status"]. therefore, we query like so:
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', 'pending')
.collect();
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', 'pending')
.collect();
now consider we want to allow querying based off multiple status values. lets, select all posts made by a given user that are pending or in_progress. we would have to write the following:
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', 'pending').eq('status', 'in_progress))
.collect();
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', 'pending').eq('status', 'in_progress))
.collect();
but what if we wanted to make list of statuses an argument, that could be passed from the FE (i.e. a user filters their own posts down to any combination of pending, in_progress completed or cancelled in order to do this, we currently have to use the convex-helper filter or a basic TS array-filter after we query the results from the index. but cannot actively filter out within withIndex , which defeats the purpose of a by_userId_status index. it would look something like this:
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id)
.collect();

// where args.status is of type: v.array(v.string())
return userPosts.filter(({status})=>args.status.includes(status)))
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id)
.collect();

// where args.status is of type: v.array(v.string())
return userPosts.filter(({status})=>args.status.includes(status)))
how might we achieve the desired performance with the approach of using arguments to index off of?
10 Replies
Convex Bot
Convex Bot2w ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
lee
lee2w ago
const userPosts = [];
for (const status of args.status) {
const userPostsWithStatus = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', status))
.collect();
userPosts.push(...userPostsWithStatus);
}
const userPosts = [];
for (const status of args.status) {
const userPostsWithStatus = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', status))
.collect();
userPosts.push(...userPostsWithStatus);
}
Jordan
JordanOP2w ago
lets say i knew the status array, how might these two methods differ in performance? I feel like if the given index is still large, then re-querying it n-times over would still be concerning option 1:
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', 'pending').eq('status', 'in_progress))
.collect();
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', 'pending').eq('status', 'in_progress))
.collect();
option 2:
const userPosts = ['status', 'in_progress'];
for (const status of args.status) {
const userPostsWithStatus = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', status))
.collect();
userPosts.push(...userPostsWithStatus);
}
const userPosts = ['status', 'in_progress'];
for (const status of args.status) {
const userPostsWithStatus = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id).eq('status', status))
.collect();
userPosts.push(...userPostsWithStatus);
}
moreover, this pattern wouldn't pair well with pagination either, which is sort of a deal breaker for my given implementation
lee
lee2w ago
option 1 doesn't work. it shouldn't typecheck (if it does, that's a bug) and at runtime it would throw an error
lee
lee2w ago
i agree it's hard to get this working with pagination. you can use https://www.npmjs.com/package/convex-helpers#paginator-manual-pagination-with-familiar-syntax but it's got some caveats
npm
convex-helpers
A collection of useful code to complement the official convex package.. Latest version: 0.1.67, last published: a month ago. Start using convex-helpers in your project by running npm i convex-helpers. There are 8 other projects in the npm registry using convex-helpers.
lee
lee2w ago
the two options that work are your option (2) and the post-filter solution
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id)
.collect();

// where args.status is of type: v.array(v.string())
return userPosts.filter(({status})=>args.status.includes(status)))
const userPosts = await ctx.db
.query("posts")
.withIndex("by_userId_status", (q) => q.eq("userId", user?._id)
.collect();

// where args.status is of type: v.array(v.string())
return userPosts.filter(({status})=>args.status.includes(status)))
from your original message
Jordan
JordanOP2w ago
oh i see now after trying to chain multiple .eq() in the callback of .withIndex() that thats not allowed since you can only have one equality check per field convex has been a helpful technology, but combining dynamic filters + indexing for performance + pagination is the biggest pain point I'm struggling to solve. there are solutions to solve each individually, but no good solution when wanting to use a combination of them together. i feel like its common logic on an application to fetch given a data-set index with user defined filters and paginate the results. curious if others have examples of how they implement this FWIW, this is the approach i will likely take for now:
return await ctx.db
.query("requests")
.withIndex("by_userId", (q) => q.eq("userId", user._id))
.filter(createStatusCallback(args.statuses))
.paginate(args.paginationOpts);
return await ctx.db
.query("requests")
.withIndex("by_userId", (q) => q.eq("userId", user._id))
.filter(createStatusCallback(args.statuses))
.paginate(args.paginationOpts);
function createStatusCallback(statuses: string[]) {
return (q: any) =>
q.or(...statuses.map((status) => q.eq(q.field("status"), status)));
}
function createStatusCallback(statuses: string[]) {
return (q: any) =>
q.or(...statuses.map((status) => q.eq(q.field("status"), status)));
}
I know convex docs recommend to avoid using filter however, the above is the only way i know to combine indexes/filter/pagination. the by_userId index at least helps with performance some, but then .filter allows us to filter out based on args.statuses. and then we still get pagination after all of that.
lee
lee2w ago
This seems like a good solution to me. As long as you're aware that the function walks the requests for the user, and only stops when it finds enough that fit the filter criteria. In particular, a single query might read a lot of data if there are lots of documents that don't match the filter the solution that doesn't read a lot of data would be using the convex-helper filter, which is what i would personally use (but i'm biased 😛 ) it would be great if convex could support unions of index ranges for paginated queries. it's not infeasible, but would take some design
Jordan
JordanOP2w ago
As long as you're aware that the function walks the requests for the user, and only stops when it finds enough that fit the filter criteria.
yes, i think i follow.
return await ctx.db
.query("requests")
.withIndex("by_userId", (q) => q.eq("userId", user._id))
.filter(createStatusCallback(args.statuses))
.paginate(args.paginationOpts);
return await ctx.db
.query("requests")
.withIndex("by_userId", (q) => q.eq("userId", user._id))
.filter(createStatusCallback(args.statuses))
.paginate(args.paginationOpts);
for this above query i expect it do the following: 1. lets assume i am querying for 10 initial items with our paginate 2. using the by_userid index on requests table, we get all requests by a given user (this is performant due to index) 3. then we iterate or "walk" through all the requests of that given user only to match the filter until we get the correct number of results to return for pagination 4. once we have the correct number of results for the pagination, then we STOP walking the remainder of the user requests I wanted to confirm that (4) is accurate and how the cursor pagination is working. if so, the above is good enough performance IMO
lee
lee2w ago
yes that's correct 👍

Did you find this page helpful?