vors
vors•2y ago

Limit on number of documents read in a query

Hi friends! We were doing some pretty innocent queries and we hit a 16384 documents in a query limit.... I created a standalone repro for your convenience. https://github.com/vors/convex-load-test Please help us understand is that a fundamental limitation of the platform or am I just stupid with how I write my queries. The meat is that we do this
await db
.query("messages")
.filter((q) => q.eq(q.field("_id"), id))
.collect();
await db
.query("messages")
.filter((q) => q.eq(q.field("_id"), id))
.collect();
Which is obviously stupid way of doing db.get but it will do for the illustration pusposes I think. Imagine that it's some other, non-id field. UPD: non-id field with a index on it.
32 Replies
ari
ari•2y ago
Hey Sergei! Yes, we do have a limit under the hood here for how many documents a single query can read. My best suggestion here would be to switch to using a paginated query (https://docs.convex.dev/using/pagination) for use cases with really large tables (in this case over that ~16k) limit
ari
ari•2y ago
Even if you don't want to paginate on the client side, you could loop calls to loadMore until you exhaust the data. I'm happy to help code up the refactor in your test repository if it would be helpful!
vors
vorsOP•2y ago
Thank you for the lighting fast response, Arnold. Sorry I think I may be somewhat a DB noob here, but should not it roughtly speaking translate into:
SELECT * FROM my_table WHERE ID=my_id
SELECT * FROM my_table WHERE ID=my_id
In the sense that if that would be Postgress with a index on ID column it would be handling it with no problem with 10M rows.... (again ID is a bad example name) I think the real-world speaking we plumb down the id's in most places to avoid relaying on indexes, however for our BI, we would need the agregated quries on the whole database. (I kind of was hoping to build something quickly on convex isntead of exporting it into something like BigQuery)
ari
ari•2y ago
Yeah! No worries, I'm a database noob myself -- I just make things pretty 🙂 Conceptually, your Convex query would be pretty close to that SQL query, the biggest difference being that Convex doesn’t do query planning (intentionally) so you’d have to query indexes with .withIndex (I believe the index by_id is builtin by default) I can’t speak for this concretely as I mostly work on the UI, but the ~16k limit is a hardcoded value on our end as a safety precaution. Someone that works on the backend here can give you a better answer about why the limit is there and support for scaling. In general though, paginating a query where you expect millions of a result would likely be important both for database and client-side perf
vors
vorsOP•2y ago
Just to double check: my current alarmism is not about the case when I want to return 20,000 messages to the client -- this makes perfect sense to paginate. It's about the case when I have a single table with 20,000 entries "Name" and I need to find the "Boris" in this table (and there are 10 people like that only). This seems like something that should not require the pagination.... I should look more into pagination I guess, but ergonomically-wise I'm concerned 🙂
ballingt
ballingt•2y ago
@vors are you familiar with Indexes in Convex?
vors
vorsOP•2y ago
Probably not! 😆
ballingt
ballingt•2y ago
If we're imagining this isn't ID then in Postgres I'd use an Index — and same in Convex
vors
vorsOP•2y ago
ok let me try to change the example to use index
ballingt
ballingt•2y ago
Indexes | Convex Developer Hub
Indexes are a data structure that allow you to speed up your
ballingt
ballingt•2y ago
But re the philosophy of query planning that Arnold mentioned — this is something we'd love to discuss, it's definitely an opinionated part of Convex! An opinion formed by DB folks here over a long time (some of it at Dropbox on infra teams) that is worth discussing
james
james•2y ago
Yep generally you don’t want to be doing SELECT * (edit: I mean a full table scan) from a large table in the live request path on any database, Convex included. Seems indexes will solve your problem here but happy to chat more if you have a use-case that falls outside of this!
vors
vorsOP•2y ago
@ari I think ~50k documents in a single table make the dashboard to halt
vors
vorsOP•2y ago
hmmm
No description
vors
vorsOP•2y ago
ok that's all nits sorry Working on an example with indexes (hopefully it just works!) Ok I think that real-time part explains it 🙂 I should just have the limiations in mind about what the database can and cannot do I kind of just treating convex like a magic tool that solves everything 🙂
james
james•2y ago
Almost everything 😉 but yeah Convex is explicitly an OLTP database, meaning it’s designed for high performance on the kind of efficient queries you want to have in user-facing code. If there’s ever a need for a non-indexed scan of a full database table (usually just used for analytics) then this is the domain of an OLAP database and for now we’d recommend streaming Convex into Snowflake etc via AirByte for these analytics-style queries.
vors
vorsOP•2y ago
makes sense. Ok so for the indexes part -- I pushed changes to https://github.com/vors/convex-load-test I'm using the
export default query(async ({ db }) => {
const messages = await db
.query("messages")
.filter((q) => q.eq(q.field("randomNumber"), 1))
.collect();
return messages.length;
});
export default query(async ({ db }) => {
const messages = await db
.query("messages")
.filter((q) => q.eq(q.field("randomNumber"), 1))
.collect();
return messages.length;
});
And my schema is
import { defineSchema, defineTable, s } from "convex/schema";

export default defineSchema({
messages: defineTable({
body: s.string(),
author: s.string(),
randomNumber: s.number(),
}).index("by_randomNumber", ["randomNumber"]),
});
import { defineSchema, defineTable, s } from "convex/schema";

export default defineSchema({
messages: defineTable({
body: s.string(),
author: s.string(),
randomNumber: s.number(),
}).index("by_randomNumber", ["randomNumber"]),
});
Hitting the same problem.... Am I doing something wrong? ohhhh There is withIndex sorry trying now
ballingt
ballingt•2y ago
ah yeah, this is part of the "no query planning;" you know you are (or aren't) using an index based on the query
vors
vorsOP•2y ago
got it sorry cool it works Minor suggestion: maybe we can surface the withIndexes in the error message on the convex side something like please RTFM about indexes
Indy
Indy•2y ago
Hahah. Yep good idea!
vors
vorsOP•2y ago
and the UI limits for few thousands entries in a table would be good to improve 🙂 we are already over them. yeah just checked, one of our prod tables is already failing to open in UI
Indy
Indy•2y ago
Curious on how you're using the dashboard UI. The UI wasn't really designed to be an important part of a long term workflow with a "big" project. We always thought it'd be far more useful for early in your project. We should definitely not break though 😉
vors
vorsOP•2y ago
yeah I didn't notice that it's unresponsive until I tried out today we do plan to build some BI, I was hoping that convex can just do it, but based on James' comments seems like a separate BI tool is a way to go from the very beginning which is a little bit disappointing to be honest, because I feel like a light-weight BI solution would be nice to have 🙂
Indy
Indy•2y ago
Yea, Convex wasn't really designed for big BI queries. This is our escape hatch: https://docs.convex.dev/using/integrations/airbyte
Using Convex with Airbyte | Convex Developer Hub
Analyze your Convex data by exporting via Airbyte.
vors
vorsOP•2y ago
even if it works against the prod to beging with I guess the wish-list item is to have a small-scale BI integrated in convex maybe I should just get over it and buy snowflake but our stack is already like 5 different vendors
Indy
Indy•2y ago
Yea thanks for the feedback on expectations here. The question really becomes what "light weight BI" really means. Because in my experience working with data eng folks very quickly we'd drop into custom queries, where nothing "light weight" would truly work.
vors
vorsOP•2y ago
true And now something completely different (but I think it may be related to me trying to open the big table): all other queries to prod are stuck 😅 it took my normal prod page with "projects" minutes to load now
Indy
Indy•2y ago
this is in the dashboard?
vors
vorsOP•2y ago
no on prod the web ok maybe it's unreleasted let me ask my teammates to test
jamwt
jamwt•2y ago
My recommendation for simple BI is hosted clickhouse. Airbyte can hook up to it pretty easily.
vors
vorsOP•2y ago
I think since we are on GCP we would probably go with BigQuery
jamwt
jamwt•2y ago
That works too. More complex but definitely capable

Did you find this page helpful?