Starlord
Starlord•4w ago

Join tables query / paginate

An important limitation in Convex that affects how we need to structure queries, especially for paginated results with joins. Let me explain the problem and show hacky workarounds: The Problem In my current code, i am trying to: 1. Paginate products 2. Then fetch variations for those products 3. Then filter based on those variations (colors, sizes) This approach is problematic because: 1. The pagination happens BEFORE the joins and filters on variations 2. This means I might get fewer results than requested after filtering 3. I can't do true SQL-like JOINs in a single query For example, if I request 10 products: - I might get 10 products from the initial pagination - After filtering by variations, I might end up with only 3-4 products that actually match - The user sees fewer items than expected Possible Solutions 1. Denormalization Approach
// Store available colors and sizes directly on the product document
interface Product {
_id: Id<"products">;
// ... other fields ...
availableColors: Id<"productColors">[];
availableSizes: Array<{
sizeCategoryId: Id<"sizeCategories">;
values: string[];
}>;
hasInventory: boolean;
}
// Store available colors and sizes directly on the product document
interface Product {
_id: Id<"products">;
// ... other fields ...
availableColors: Id<"productColors">[];
availableSizes: Array<{
sizeCategoryId: Id<"sizeCategories">;
values: string[];
}>;
hasInventory: boolean;
}
2. Materialized Views Approach
// Create a separate table that pre-joins product data
interface ProductView {
_id: Id<"productViews">;
productId: Id<"products">;
availableColors: Id<"productColors">[];
availableSizes: Array<{
sizeCategoryId: Id<"sizeCategories">;
values: string[];
}>;
// ... other fields needed for filtering
}
// Create a separate table that pre-joins product data
interface ProductView {
_id: Id<"productViews">;
productId: Id<"products">;
availableColors: Id<"productColors">[];
availableSizes: Array<{
sizeCategoryId: Id<"sizeCategories">;
values: string[];
}>;
// ... other fields needed for filtering
}
3. Batch Loading Approach
// Load more products than needed initially, then filter
const BATCH_MULTIPLIER = 3; // Load 3x more products than requested
const batchSize = paginationOpts.numItems * BATCH_MULTIPLIER;

// Modified pagination options
const batchPaginationOpts = {
...paginationOpts,
numItems: batchSize
};
// Load more products than needed initially, then filter
const BATCH_MULTIPLIER = 3; // Load 3x more products than requested
const batchSize = paginationOpts.numItems * BATCH_MULTIPLIER;

// Modified pagination options
const batchPaginationOpts = {
...paginationOpts,
numItems: batchSize
};
42 Replies
Convex Bot
Convex Bot•4w ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
ian
ian•4w ago
It seems the easiest way to start would be to do (3), maybe fetching 100 at a time. You can always filter the UI to only show the first 10, then have a load more / etc. button keep showing 10 until a new page is actually needed from the backend. For products, fetching a lot at once seems like a good move. They're unlikely to change, so that query won't be re-fetching frequently. In a messaging app you might query fewer, because every new message re-fetches up to the page size. Another approach to consider if there aren't a ton of products is to paginate through them all to get them client-side, then do your filtering there. it'd be crazy fast since it wouldn't need a server round-trip, and you may end up doing less DB bandwidth if the filters change frequently. Or fetch the first 1k products & metadata (ProductView), and apply filters client-side, and only fetch more if there aren't enough products after the filters
Starlord
StarlordOP•4w ago
none of this would work. those are thousands of products. fitler can be very detialed and return only few of the products. sometime i need like 100 paginate requests just to find right products. this is huge limitation.
Starlord
StarlordOP•4w ago
here also someone mentioned this problem in youtube comments
No description
Starlord
StarlordOP•4w ago
join query is pretty basic feature that is missing in convex also i cant put variation data into product table because its not possible to filter json content inside cell so basically there is 0 solution for this problem
lee
lee•4w ago
This is an interesting problem. The key question is: do you want to pre-filter or post-filter? i.e. do you want to fetch X documents that all match your filter, or do you want to fetch Y documents and then check which ones match your filter until you have X results. If the former, you'll need to encode the filter condition into an index range. This may be via a join table, so you paginate on the table that joins variations -> products, with an index on the variation. If the latter, you'll need to potentially fetch an unbounded number of results in one query. You can do this in Convex using some of the convex-helpers like https://github.com/get-convex/convex-helpers/blob/main/packages/convex-helpers/server/pagination.ts#L127 , but I agree it's not easy. A simpler version of this is to fetch a fixed page size, filter in the query, and then call loadMore on the client automatically until you get enough results. This is what @Ian suggested, i think, and it's what I would do if I had a complex filtered query to paginate.
GitHub
convex-helpers/packages/convex-helpers/server/pagination.ts at main...
A collection of useful code to complement the official packages. - get-convex/convex-helpers
Starlord
StarlordOP•4w ago
well everything of this sounds not like a real solution. join query functionality would be great. like every database has and i want to paginate and filter in the same query. like its possible in sql here is an example:
select * from (select * from users where f.mime_type = 'jpg' limit 10 offset 10) as u
left join files f
on u.id = f.user_id
select * from (select * from users where f.mime_type = 'jpg' limit 10 offset 10) as u
left join files f
on u.id = f.user_id
1. there are limits how many documents can be loaded into memory in convex. this means filtering needs to happen on the database side 2. i cant paginate and than filter. it needs to happen on one run. otherwise i will get fewer results or no results the only solution that i am using currently is huge batch size and multiple paginate requests till everything is loaded. basically what could have be done in one query requires multiple executions and requests from the client
lee
lee•4w ago
What database is this query written for? I don't understand how f can be used in a filter inside the parentheses, when it's not in scope.
Starlord
StarlordOP•4w ago
its sql
lee
lee•4w ago
Sql is a language. Mysql/postgres/etc are databases which cannot run this query
Starlord
StarlordOP•4w ago
joining two tables users and files of the user and filter by file type but its mysql query + paginate users
lee
lee•4w ago
It sounds like you fundamentally disagree with the Convex principle that OLTP queries should be fast and bounded. If you want to do large unbounded queries we recommend OLAP databases, which can execute arbitrary sql
Starlord
StarlordOP•4w ago
i dont disagree i just can find solution for simple method to paginate and filter at the same time filter based on another table i mean paginate 1 table + filter based on 2nd table
lee
lee•4w ago
I still don't understand this query. It's not valid mysql syntax
Starlord
StarlordOP•4w ago
it would be easy in js but i cant load all the table into memory SELECT * FROM (SELECT * FROM users LIMIT 10 OFFSET 10) AS u LEFT JOIN files f ON u.id = f.user_id WHERE f.mime_type = 'jpg'
lee
lee•4w ago
this query is equivalent to the convex code where it fetches a page of 10 and post-filters (potentially returning very few results)
Starlord
StarlordOP•4w ago
the query paginates users and filters bases on files table
lee
lee•4w ago
i think you might be aiming for
SELECT * FROM users u
LEFT JOIN files f
ON u.id = f.user_id
WHERE f.mime_type = 'jpg'
LIMIT 10 OFFSET 10
SELECT * FROM users u
LEFT JOIN files f
ON u.id = f.user_id
WHERE f.mime_type = 'jpg'
LIMIT 10 OFFSET 10
Starlord
StarlordOP•4w ago
yes
lee
lee•4w ago
how would you feel about a plain-javascript .filter function you could attach to a paginated query, that would filter out all of the results before reaching the page size. the downside would be if the filter is sparse, it could read too much data & be slow or throw error
Starlord
StarlordOP•4w ago
this will not work if there are thouthands of documents
lee
lee•4w ago
correct
Starlord
StarlordOP•4w ago
and thats the case 🙂
lee
lee•4w ago
nothing will work if there are thousands od documents because then it's an OLAP query -- can't be made fast or efficient
Starlord
StarlordOP•4w ago
there are 1k products. each of product has 1-10 variations means 10k variations at max
lee
lee•4w ago
the limit for convex queries is 16k documents, so this isn't unreasonable https://docs.convex.dev/production/state/limits#transactions
Limits | Convex Developer Hub
We’d love for you to have unlimited joy building on Convex but engineering
Starlord
StarlordOP•4w ago
well i may not hit limits now but in future probably well if you would allow to have cached queries without those limits it would be possible to prefilter in js this would mean each filter result would be stored in cache and if it requires too much cache just pay for extra storage so basically filtering and joining data would be done in js and stored in cache but need to disable limits for this
RJ
RJ•4w ago
I have almost literally the exact same scenario that you've got @Starlord (same domain, even, with products and variations thereof), and I solved it with the "materialized view" approach. You can use the convex-helpers database triggers feature to keep this up-to-date. If you have array or object fields which you need to filter on, you can flatten those in your "view" table.
Starlord
StarlordOP•4w ago
so basically secondary table that updates based on triggers
RJ
RJ•4w ago
I'm not the database expert that many Convex employees are, but I think the frustrating part here is basically that SQL will let you run a very inefficient query, and it will get slower and slower as your dataset grows, and eventually it may become pathological (by which I mean, cause your whole database to fall down). Convex imposes limits so that you are forced to architect your code such that you can't fall into this trap. So like Lee said, you can either use an OLAP database for this kind of query, or you can put in a little more architectural effort to ensure that it remains fast and efficient. Yeah exactly
Starlord
StarlordOP•4w ago
with secondary table i would also have a problem. i have variations table existing exactly for filtering. because if the data would be stored in json inside products table i would not be able to use index or filter based on this
productVariations: defineTable({
amount: v.float64(),
brandArticle: v.optional(v.string()),
color: v.string(),
productColor: v.id("productColors"),
productId: v.id("products"),
sizeId: v.id("sizes"),
variationIndex: v.float64(),
})
productVariations: defineTable({
amount: v.float64(),
brandArticle: v.optional(v.string()),
color: v.string(),
productColor: v.id("productColors"),
productId: v.id("products"),
sizeId: v.id("sizes"),
variationIndex: v.float64(),
})
thats why i have 2 tables products and products variations. so queries can use variations for filter
RJ
RJ•4w ago
I don't think I follow what the problem is, exactly. Where are the JSON fields? You should be able to just join the variations with the products in your "view" table, and thus every field will be flattened (and indexable). I agree that if you just tried to merge your products table with your productVariations table such that you end up with 1 row per product, and stored each variation as an array field on this new "view" document, your problem won't be solved.
Starlord
StarlordOP•4w ago
as far i know its not possible to filter based on array of numbers / strings inside field. if flatten table would have a list of colors for example in one field that still cant be used as filter
RJ
RJ•4w ago
Right, you would need to flatten those colors, too. Like so:
type Product = {
colors: v.array(v.string()),
};
type ProductsTable = Product[];

// should become

type SearchableProduct = {
productId: v.id("products"),
color: v.string()
};
type SearchableProductsTable = SearchableProduct[];

// so if you have

const myProducts: ProductsTable = [{
_id: "abc123",
colors: ["blue", "green"]
}]

// you need to turn this into

const mySearchableProducts: SearchableProductsTable = [
{
productId: "abc123",
color: "blue"
},
{
productId: "abc123",
color: "green"
},
]
type Product = {
colors: v.array(v.string()),
};
type ProductsTable = Product[];

// should become

type SearchableProduct = {
productId: v.id("products"),
color: v.string()
};
type SearchableProductsTable = SearchableProduct[];

// so if you have

const myProducts: ProductsTable = [{
_id: "abc123",
colors: ["blue", "green"]
}]

// you need to turn this into

const mySearchableProducts: SearchableProductsTable = [
{
productId: "abc123",
color: "blue"
},
{
productId: "abc123",
color: "green"
},
]
Starlord
StarlordOP•4w ago
how did you solve filtering based on those flatten data?
RJ
RJ•4w ago
Like normal, you just add an index for the fields you want to filter on (color in this case) and use withIndex to filter on them
Starlord
StarlordOP•4w ago
but how do you filter by one value on array of values? do you use js filter or database filter?
RJ
RJ•4w ago
When you create this new table ("view"), you need to flatten (distribute, join) every field containing nested data or reference to a foreign table (these are functionally the same thing for our purposes here). That's what I was trying to show above—if you have a field with an array, you need to distribute each element of the array across a new field (colors -> color). So if you have 1 product with 2 colors, you need to have 2 rows in your "view" table, each representing a different variation (different color) of the same product.
Starlord
StarlordOP•4w ago
hm what if color amount is dynamic? just pregenerate like 10 colors?
RJ
RJ•4w ago
I don't think that matters, does it? You're just filtering on a string field, color, like above Let me try to show you what it would look like if you modeled colors differently
products: defineTable({
...
}),
productColors: defineTable({
color: v.string(),
productId: v.id("products")
})
products: defineTable({
...
}),
productColors: defineTable({
color: v.string(),
productId: v.id("products")
})
This is the same thing from a cardinality (I think that's the right term) perspective As storing each color in a colors field on the products table There's the same number of possible combinations of product and color So just think about how you would join products and productColors to get a "view" table That will look the same as if you instead model color variations with a colors field on the products table At the end of the day, your "view" table will look like this:
searchableProducts: defineTable({
productId: v.id("products"),
color: v.string(),
})
searchableProducts: defineTable({
productId: v.id("products"),
color: v.string(),
})
Which is just the result of joining products and productColors
Starlord
StarlordOP•4w ago
ok undersstand thanks pretty hacky 😄
RJ
RJ•4w ago
No problem! I don't think this is any different from how you'd want to do this in a relationship database that you're querying with SQL, in order to achieve comparable performance

Did you find this page helpful?