Join tables query / paginate
An important limitation in Convex that affects how we need to structure queries, especially for paginated results with joins. Let me explain the problem and show hacky workarounds:
The Problem
In my current code, i am trying to:
1. Paginate products
2. Then fetch variations for those products
3. Then filter based on those variations (colors, sizes)
This approach is problematic because:
1. The pagination happens BEFORE the joins and filters on variations
2. This means I might get fewer results than requested after filtering
3. I can't do true SQL-like JOINs in a single query
For example, if I request 10 products:
- I might get 10 products from the initial pagination
- After filtering by variations, I might end up with only 3-4 products that actually match
- The user sees fewer items than expected
Possible Solutions
1. Denormalization Approach
2. Materialized Views Approach
3. Batch Loading Approach
42 Replies
Thanks for posting in <#1088161997662724167>.
Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets.
- Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.)
- Use search.convex.dev to search Docs, Stack, and Discord all at once.
- Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI.
- Avoid tagging staff unless specifically instructed.
Thank you!
It seems the easiest way to start would be to do (3), maybe fetching 100 at a time. You can always filter the UI to only show the first 10, then have a load more / etc. button keep showing 10 until a new page is actually needed from the backend.
For products, fetching a lot at once seems like a good move. They're unlikely to change, so that query won't be re-fetching frequently. In a messaging app you might query fewer, because every new message re-fetches up to the page size.
Another approach to consider if there aren't a ton of products is to paginate through them all to get them client-side, then do your filtering there. it'd be crazy fast since it wouldn't need a server round-trip, and you may end up doing less DB bandwidth if the filters change frequently. Or fetch the first 1k products & metadata (ProductView), and apply filters client-side, and only fetch more if there aren't enough products after the filters
none of this would work. those are thousands of products. fitler can be very detialed and return only few of the products. sometime i need like 100 paginate requests just to find right products. this is huge limitation.
here also someone mentioned this problem in youtube comments
data:image/s3,"s3://crabby-images/0d8e2/0d8e2a9f809956714aefc01b92263d2d05a6e28b" alt="No description"
join query is pretty basic feature that is missing in convex
also i cant put variation data into product table because its not possible to filter json content inside cell
so basically there is 0 solution for this problem
This is an interesting problem. The key question is: do you want to pre-filter or post-filter? i.e. do you want to fetch X documents that all match your filter, or do you want to fetch Y documents and then check which ones match your filter until you have X results.
If the former, you'll need to encode the filter condition into an index range. This may be via a join table, so you paginate on the table that joins variations -> products, with an index on the variation.
If the latter, you'll need to potentially fetch an unbounded number of results in one query. You can do this in Convex using some of the convex-helpers like https://github.com/get-convex/convex-helpers/blob/main/packages/convex-helpers/server/pagination.ts#L127 , but I agree it's not easy.
A simpler version of this is to fetch a fixed page size, filter in the query, and then call
loadMore
on the client automatically until you get enough results. This is what @Ian suggested, i think, and it's what I would do if I had a complex filtered query to paginate.GitHub
convex-helpers/packages/convex-helpers/server/pagination.ts at main...
A collection of useful code to complement the official packages. - get-convex/convex-helpers
well everything of this sounds not like a real solution. join query functionality would be great. like every database has
and i want to paginate and filter in the same query. like its possible in sql
here is an example:
1. there are limits how many documents can be loaded into memory in convex. this means filtering needs to happen on the database side
2. i cant paginate and than filter. it needs to happen on one run. otherwise i will get fewer results or no results
the only solution that i am using currently is huge batch size and multiple paginate requests till everything is loaded. basically what could have be done in one query requires multiple executions and requests from the client
What database is this query written for? I don't understand how f can be used in a filter inside the parentheses, when it's not in scope.
its sql
Sql is a language. Mysql/postgres/etc are databases which cannot run this query
joining two tables users and files of the user and filter by file type
but its mysql query
+ paginate users
It sounds like you fundamentally disagree with the Convex principle that OLTP queries should be fast and bounded. If you want to do large unbounded queries we recommend OLAP databases, which can execute arbitrary sql
i dont disagree i just can find solution for simple method to paginate and filter at the same time
filter based on another table i mean
paginate 1 table + filter based on 2nd table
I still don't understand this query. It's not valid mysql syntax
it would be easy in js but i cant load all the table into memory
SELECT * FROM
(SELECT * FROM users LIMIT 10 OFFSET 10) AS u
LEFT JOIN files f
ON u.id = f.user_id
WHERE f.mime_type = 'jpg'
this query is equivalent to the convex code where it fetches a page of 10 and post-filters (potentially returning very few results)
the query paginates users and filters bases on files table
i think you might be aiming for
yes
how would you feel about a plain-javascript
.filter
function you could attach to a paginated query, that would filter out all of the results before reaching the page size. the downside would be if the filter is sparse, it could read too much data & be slow or throw errorthis will not work if there are thouthands of documents
correct
and thats the case 🙂
nothing will work if there are thousands od documents
because then it's an OLAP query -- can't be made fast or efficient
there are 1k products. each of product has 1-10 variations
means 10k variations at max
the limit for convex queries is 16k documents, so this isn't unreasonable https://docs.convex.dev/production/state/limits#transactions
Limits | Convex Developer Hub
We’d love for you to have unlimited joy building on Convex but engineering
well i may not hit limits now but in future probably
well if you would allow to have cached queries without those limits it would be possible to prefilter in js
this would mean each filter result would be stored in cache
and if it requires too much cache just pay for extra storage
so basically filtering and joining data would be done in js and stored in cache but need to disable limits for this
I have almost literally the exact same scenario that you've got @Starlord (same domain, even, with products and variations thereof), and I solved it with the "materialized view" approach. You can use the
convex-helpers
database triggers feature to keep this up-to-date. If you have array or object fields which you need to filter on, you can flatten those in your "view" table.so basically secondary table that updates based on triggers
I'm not the database expert that many Convex employees are, but I think the frustrating part here is basically that SQL will let you run a very inefficient query, and it will get slower and slower as your dataset grows, and eventually it may become pathological (by which I mean, cause your whole database to fall down). Convex imposes limits so that you are forced to architect your code such that you can't fall into this trap. So like Lee said, you can either use an OLAP database for this kind of query, or you can put in a little more architectural effort to ensure that it remains fast and efficient.
Yeah exactly
with secondary table i would also have a problem. i have variations table existing exactly for filtering. because if the data would be stored in json inside products table i would not be able to use index or filter based on this
thats why i have 2 tables products and products variations. so queries can use variations for filter
I don't think I follow what the problem is, exactly. Where are the JSON fields?
You should be able to just join the variations with the products in your "view" table, and thus every field will be flattened (and indexable). I agree that if you just tried to merge your
products
table with your productVariations
table such that you end up with 1 row per product, and stored each variation as an array field on this new "view" document, your problem won't be solved.as far i know its not possible to filter based on array of numbers / strings inside field.
if flatten table would have a list of colors for example in one field that still cant be used as filter
Right, you would need to flatten those colors, too. Like so:
how did you solve filtering based on those flatten data?
Like normal, you just add an index for the fields you want to filter on (
color
in this case) and use withIndex
to filter on thembut how do you filter by one value on array of values? do you use js filter or database filter?
When you create this new table ("view"), you need to flatten (distribute, join) every field containing nested data or reference to a foreign table (these are functionally the same thing for our purposes here). That's what I was trying to show above—if you have a field with an array, you need to distribute each element of the array across a new field (
colors
-> color
). So if you have 1 product with 2 colors, you need to have 2 rows in your "view" table, each representing a different variation (different color) of the same product.hm what if color amount is dynamic?
just pregenerate like 10 colors?
I don't think that matters, does it? You're just filtering on a string field,
color
, like above
Let me try to show you what it would look like if you modeled colors differently
This is the same thing from a cardinality (I think that's the right term) perspective
As storing each color in a colors
field on the products
table
There's the same number of possible combinations of product and color
So just think about how you would join products
and productColors
to get a "view" table
That will look the same as if you instead model color variations with a colors
field on the products
table
At the end of the day, your "view" table will look like this:
Which is just the result of joining products
and productColors
ok undersstand thanks
pretty hacky 😄
No problem!
I don't think this is any different from how you'd want to do this in a relationship database that you're querying with SQL, in order to achieve comparable performance