Limit on number of documents read in a query
Hi friends! We were doing some pretty innocent queries and we hit a 16384 documents in a query limit....
I created a standalone repro for your convenience.
https://github.com/vors/convex-load-test
Please help us understand is that a fundamental limitation of the platform or am I just stupid with how I write my queries. The meat is that we do this
Which is obviously stupid way of doing
db.get
but it will do for the illustration pusposes I think. Imagine that it's some other, non-id field. UPD: non-id field with a index on it.32 Replies
Hey Sergei! Yes, we do have a limit under the hood here for how many documents a single query can read. My best suggestion here would be to switch to using a paginated query (https://docs.convex.dev/using/pagination) for use cases with really large tables (in this case over that ~16k) limit
Paginated Queries | Convex Developer Hub
Load paginated queries.
Even if you don't want to paginate on the client side, you could loop calls to
loadMore
until you exhaust the data. I'm happy to help code up the refactor in your test repository if it would be helpful!Thank you for the lighting fast response, Arnold. Sorry I think I may be somewhat a DB noob here, but should not it roughtly speaking translate into:
In the sense that if that would be Postgress with a index on ID column it would be handling it with no problem with 10M rows....
(again ID is a bad example name)
I think the real-world speaking we plumb down the id's in most places to avoid relaying on indexes, however for our BI, we would need the agregated quries on the whole database.
(I kind of was hoping to build something quickly on convex isntead of exporting it into something like BigQuery)
Yeah! No worries, I'm a database noob myself
-- I just make things pretty 🙂
Conceptually, your Convex query would be pretty close to that SQL query, the biggest difference being that Convex doesn’t do query planning (intentionally) so you’d have to query indexes with
.withIndex
(I believe the index by_id
is builtin by default)
I can’t speak for this concretely as I mostly work on the UI, but the ~16k limit is a hardcoded value on our end as a safety precaution. Someone that works on the backend here can give you a better answer about why the limit is there and support for scaling. In general though, paginating a query where you expect millions of a result would likely be important both for database and client-side perfJust to double check: my current alarmism is not about the case when I want to return 20,000 messages to the client -- this makes perfect sense to paginate. It's about the case when I have a single table with 20,000 entries "Name" and I need to find the "Boris" in this table (and there are 10 people like that only).
This seems like something that should not require the pagination....
I should look more into pagination I guess, but ergonomically-wise I'm concerned 🙂
@vors are you familiar with Indexes in Convex?
Probably not! 😆
If we're imagining this isn't ID then in Postgres I'd use an Index — and same in Convex
ok let me try to change the example to use index
Indexes | Convex Developer Hub
Indexes are a data structure that allow you to speed up your
But re the philosophy of query planning that Arnold mentioned — this is something we'd love to discuss, it's definitely an opinionated part of Convex! An opinion formed by DB folks here over a long time (some of it at Dropbox on infra teams) that is worth discussing
Yep generally you don’t want to be doing
SELECT *
(edit: I mean a full table scan) from a large table in the live request path on any database, Convex included. Seems indexes will solve your problem here but happy to chat more if you have a use-case that falls outside of this!@ari I think ~50k documents in a single table make the dashboard to halt
hmmm

ok that's all nits sorry
Working on an example with indexes (hopefully it just works!)
Ok I think that real-time part explains it 🙂 I should just have the limiations in mind about what the database can and cannot do
I kind of just treating convex like a magic tool that solves everything 🙂
Almost everything 😉
but yeah Convex is explicitly an OLTP database, meaning it’s designed for high performance on the kind of efficient queries you want to have in user-facing code. If there’s ever a need for a non-indexed scan of a full database table (usually just used for analytics) then this is the domain of an OLAP database and for now we’d recommend streaming Convex into Snowflake etc via AirByte for these analytics-style queries.
makes sense.
Ok so for the indexes part -- I pushed changes to https://github.com/vors/convex-load-test
I'm using the
And my schema is
Hitting the same problem....
Am I doing something wrong?
ohhhh
There is
withIndex
sorry trying nowah yeah, this is part of the "no query planning;" you know you are (or aren't) using an index based on the query
got it sorry
cool it works
Minor suggestion: maybe we can surface the
withIndexes
in the error message on the convex side
something like please RTFM about indexesHahah. Yep good idea!
and the UI limits for few thousands entries in a table would be good to improve 🙂 we are already over them.
yeah just checked, one of our prod tables is already failing to open in UI
Curious on how you're using the dashboard UI. The UI wasn't really designed to be an important part of a long term workflow with a "big" project. We always thought it'd be far more useful for early in your project.
We should definitely not break though 😉
yeah I didn't notice that it's unresponsive until I tried out today
we do plan to build some BI, I was hoping that convex can just do it, but based on James' comments seems like a separate BI tool is a way to go from the very beginning
which is a little bit disappointing to be honest, because I feel like a light-weight BI solution would be nice to have 🙂
Yea, Convex wasn't really designed for big BI queries. This is our escape hatch: https://docs.convex.dev/using/integrations/airbyte
Using Convex with Airbyte | Convex Developer Hub
Analyze your Convex data by exporting via Airbyte.
even if it works against the prod to beging with
I guess the wish-list item is to have a small-scale BI integrated in convex
maybe I should just get over it and buy snowflake
but our stack is already like 5 different vendors
Yea thanks for the feedback on expectations here.
The question really becomes what "light weight BI" really means. Because in my experience working with data eng folks very quickly we'd drop into custom queries, where nothing "light weight" would truly work.
true
And now something completely different (but I think it may be related to me trying to open the big table): all other queries to prod are stuck 😅
it took my normal prod page with "projects" minutes to load now
this is in the dashboard?
no on prod
the web
ok maybe it's unreleasted
let me ask my teammates to test
My recommendation for simple BI is hosted clickhouse. Airbyte can hook up to it pretty easily.
I think since we are on GCP we would probably go with BigQuery
That works too. More complex but definitely capable