burnstony#1975
burnstony#19752mo ago

pagination

trying to find a way to randomly access a different page getPage doesn't seem to work how can I go straight to the 100th page how can I filter some out so pages only include the desired results, there are to many bytes for me to collect the whole table
36 Replies
Convex Bot
Convex Bot2mo ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
Clever Tagline
Clever Tagline2mo ago
Could you describe your use case in more detail? How many documents are in your table? (I'll be offline the rest of the night, but I'll check back tomorrow)
lee
lee2mo ago
getPage is useful if you know which index keys you want to look at, so if you have an incrementingIndex: v.number() on your documents (or something similar) it can be used for that. If you want to jump to a certain place in your table with just an offset (number of documents to skip), you'll need to add some structure. For example, https://www.convex.dev/components/aggregate#offset-based-pagination
Convex
Aggregate
Keep track of sums and counts in a denormalized and scalable way.
burnstony#1975
burnstony#1975OP2mo ago
I have this schema timeline: defineTable({ podcast_id: v.id("podcast"), episode_id: v.id("episode"), episode_number: v.number(), start: v.string(), end:v.string(), geoname: v.string(), chart: v.string(), rank: v.number(), }) .index("start", ["start", "chart", "rank","episode_number"]), with 30-40,000 records need to be able to filter on a range for start, rank and also filter on 1-3 values for chart the tools are starting to make me doubt the scalability of convex for large data sets for instance await ctx.db .query("timeline") .withIndex("start", (q) => q.gte([cursor.start, cursor.chart, cursor.rank, cursor.episode_number])) doesn't work, I can actually filter on everything after a particular record, on an index, which makes paginating difficult
lee
lee2mo ago
this is a great point. it should be possible, but we don't expose a nice interface for it. the closest we have is streamQuery here https://github.com/get-convex/convex-helpers/blob/b69982431f592cb3405326e9122054ad66d10adb/packages/convex-helpers/server/pagination.ts#L127 , but you would have to set up the pagination cursor yourself.
GitHub
convex-helpers/packages/convex-helpers/server/pagination.ts at b699...
A collection of useful code to complement the official packages. - get-convex/convex-helpers
burnstony#1975
burnstony#1975OP2mo ago
Everything I’ve tried with stream query seems to have the same problem as anything just to count the records pulls everything into memory and gets an error
lee
lee2mo ago
Oh right, there's no way to get >16k records in a single query or mutation. You can call it repeatedly from the client, with loadMore(), or you can call it repeatedly from an action
burnstony#1975
burnstony#1975OP2mo ago
I dont want all the records, I need a way to skip ahead, and to get the indexes of different pages
lee
lee2mo ago
Convex
Aggregate
Keep track of sums and counts in a denormalized and scalable way.
burnstony#1975
burnstony#1975OP2mo ago
This is a little light on details. Seems like it should be supported directly in the schema None of these answers is very satisfying aggregate is promising but not well documented the fact I cant do withindex(?).gt([x,y,z]) so greater than the cursor is frustrating also skip seems necessary as well as take
jamwt
jamwt2mo ago
you want the section labeled "offset-based pagination" this actually can't be built in b/c it requires a special index normal databases can't do it either
burnstony#1975
burnstony#1975OP2mo ago
I think I get why it can't be default, but it would be nice to be able to configure it from the schema @Jamie I have implemented this yet, but it sounds like it should work correctly const { key } = await photos.at(ctx, offset, { namespace: album }); but than this part doesn't work how I would like, my table has a multiple column index, and I would like to just get everything between two offsets from the table return await ctx.db.query("photos") .withIndex("by_album_creation_time", q=>q.eq("album", album).gte("_creationTime", key)) .take(numItems); something like q.gt(key) @Jamie What is the index actually implemented as? I am confused why it would be difficult to do gt(key) also confused why it would be difficult to do skip(?) startingAtOffset(?) @Jamie on most platforms I'd probably just dive into the aggregate, but it seems like a lot of mess and maintenance headache that convex so far has helped me
lee
lee2mo ago
it sounds like you want to call streamQuery to do the gt(key) query. can you share what you have tried for streamQuery? i'm considering making a more intuitive helper, but would like to know what you would find intuitive the idea would be you first do const { key } = await photos.at(ctx, offset, { namespace: album });, then call streamQuery on that key to get as many photos as you want, starting at that offset
jamwt
jamwt2mo ago
@burnstony#1975 for some background on why this is hard, just two main things to know. (1) in databases, indexes are represented as btrees; and (2) databases attempt to be efficient, so [by default] indexes only store the minimum possible information--that is, how to find the position of the indexed value only (not the rank within the index) within the tree in O(log n)
jamwt
jamwt2mo ago
claude is better at typing faster than me, so here is a bit more expansion on the basis of this: https://claude.ai/share/a53e3de5-a627-4e63-843b-e82bfa646329
Claude
Talk with Claude, an AI assistant from Anthropic
jamwt
jamwt2mo ago
basically, you need a tree structure that keeps track of how many children each subnode has, and then you can find the nth quickly. that's what the aggregate component is helping with
jamwt
jamwt2mo ago
re OFFSET with a traditional databases, it's kind of sneaky but they have the same limitation. so OFFSET-based pagination is really really expensive ( https://claude.ai/chat/858340c6-38b9-470f-932b-ac45f99fdd7b )
Claude
Talk with Claude, an AI assistant from Anthropic
burnstony#1975
burnstony#1975OP2mo ago
@Jamie will @lee 's idea work @Jamie the second Claude link doesn't work Properly setting up the aggregate still seems a little tricky and being able to tell that it is up to date seems unclear this does make me understand the issue
jamwt
jamwt2mo ago
Claude
Talk with Claude, an AI assistant from Anthropic
jamwt
jamwt2mo ago
I'm sure it will, Lee understands this stuff really well. he built the aggregate component
burnstony#1975
burnstony#1975OP2mo ago
@lee let me find it @lee I ran this export const processLargeDataset = query({ handler: async (ctx) => { console.log("processLargeDataset") let count = 0; const stream = await streamQuery(ctx, { table: "timeline", index: "start_index", order: "asc", schema: schema, // Your schema definition } ) for await (const [] of stream) { count++; } return count; }, });
lee
lee2mo ago
cool. but i thought you were trying to read documents between two offsets i would expect it to look like
const { key } = await aggregate.at(ctx, offset);
const stream = streamQuery(ctx, {
table: "timeline",
index: "start_index",
order: "asc",
startIndexKey: key,
schema,
});
let count = 0;
for (await const [doc] of stream) {
if (count > 10) break;
// handle doc
count+=1;
}
const { key } = await aggregate.at(ctx, offset);
const stream = streamQuery(ctx, {
table: "timeline",
index: "start_index",
order: "asc",
startIndexKey: key,
schema,
});
let count = 0;
for (await const [doc] of stream) {
if (count > 10) break;
// handle doc
count+=1;
}
burnstony#1975
burnstony#1975OP2mo ago
this runs and seems correct
export const processLargeDataset = query({
handler: async (ctx) => {
console.log("processLargeDataset")
const { key } = await timeline_aggregate.at(ctx, 10);
let count = 0;
const stream = await streamQuery(ctx, {
table: "timeline",
index: "start_index",
order: "asc",
startIndexKey: key,
schema: schema, // Your schema definition
}
)
const response: Array<Doc<"timeline">> = []
console.log("key", key)
for await (const [doc] of stream) {
if (count > 10) break;
count++;
response.push(doc)
}
return response;
},
});
export const processLargeDataset = query({
handler: async (ctx) => {
console.log("processLargeDataset")
const { key } = await timeline_aggregate.at(ctx, 10);
let count = 0;
const stream = await streamQuery(ctx, {
table: "timeline",
index: "start_index",
order: "asc",
startIndexKey: key,
schema: schema, // Your schema definition
}
)
const response: Array<Doc<"timeline">> = []
console.log("key", key)
for await (const [doc] of stream) {
if (count > 10) break;
count++;
response.push(doc)
}
return response;
},
});
but I dont have the aggregate populated yet Thank you lee, this looks like it should work, basically allowing a take starting at a given index the equivalent of .gt(key).take(n) the AI basically thinks you can do that directly using withindex it also thinks skip is supported
lee
lee2mo ago
great that it's working! ai hallucinations are a problem for sure
lee
lee2mo ago
btw there's also getPage (described here https://stack.convex.dev/pagination ) which is like streamQuery but returns an array instead of an async iterable
Take Control of Pagination
Convex offers robust control over pagination with a powerful function, getPage, enabling complex edge cases. In this article, we go over how to use th...
burnstony#1975
burnstony#1975OP2mo ago
both seem to work the same, and be reactive, which is pretty cool wondering how ctx.db.query getPage streamQuery are related
export const pageOfTimeline = query({
args: {
page: v.number(),
pageSize: v.number()
},
handler: async (ctx, { page, pageSize }) => {
const count = await timeline_aggregate.count(ctx);
console.log("count", count);
const { key } = await timeline_aggregate.at(ctx, (page-1) * pageSize);
console.log(key);
let pageData = getPage(ctx, {
table: "timeline",
index: "start_index",
startIndexKey: key,
targetMaxRows: pageSize,
schema: schema
});
return (await pageData).page;
},
});

export const streamPageOfTimeline = query({
args: {
page: v.number(),
pageSize: v.number()
},
handler: async (ctx, { page, pageSize }) => {
console.log("processLargeDataset")
const { key } = await timeline_aggregate.at(ctx, (page-1) * pageSize);
let count = 0;
const stream = await streamQuery(ctx, {
table: "timeline",
index: "start_index",
order: "asc",
startIndexKey: key,
schema: schema, // Your schema definition
}
)
const response: Array<Doc<"timeline">> = []
console.log("key", key)
for await (const [doc] of stream) {
if (count > pageSize) break;
count++;
response.push(doc)
}
return response;
},
});
export const pageOfTimeline = query({
args: {
page: v.number(),
pageSize: v.number()
},
handler: async (ctx, { page, pageSize }) => {
const count = await timeline_aggregate.count(ctx);
console.log("count", count);
const { key } = await timeline_aggregate.at(ctx, (page-1) * pageSize);
console.log(key);
let pageData = getPage(ctx, {
table: "timeline",
index: "start_index",
startIndexKey: key,
targetMaxRows: pageSize,
schema: schema
});
return (await pageData).page;
},
});

export const streamPageOfTimeline = query({
args: {
page: v.number(),
pageSize: v.number()
},
handler: async (ctx, { page, pageSize }) => {
console.log("processLargeDataset")
const { key } = await timeline_aggregate.at(ctx, (page-1) * pageSize);
let count = 0;
const stream = await streamQuery(ctx, {
table: "timeline",
index: "start_index",
order: "asc",
startIndexKey: key,
schema: schema, // Your schema definition
}
)
const response: Array<Doc<"timeline">> = []
console.log("key", key)
for await (const [doc] of stream) {
if (count > pageSize) break;
count++;
response.push(doc)
}
return response;
},
});
lee
lee2mo ago
nice! getPage just calls streamQuery under the hood. streamQuery calls ctx.db.query multiple times to piece together the interval that you want to query.
burnstony#1975
burnstony#1975OP5w ago
@lee I've accidentally inserted everything in the aggregate twice, how do I clear everything from the aggregate or otherwise fix it
lee
lee5w ago
Docs page has "Repair incorrect aggregates" section
burnstony#1975
burnstony#1975OP5w ago
@lee thank you (there are no anchor tag links to the sections on this page - which would be nice when sharing them) https://www.convex.dev/components/aggregate Repair incorrect aggregates If some mutation or direct write in the Dashboard updated the source of truth data without writing to the aggregate, they can get out of sync and the returned aggregates may be incorrect. The simplest way to fix is to start over. Either call await aggregate.clear(ctx) or rename the component like app.use(aggregate, { name: "newName" }) which will reset it to be empty. Then follow the instructions from above. There is an alternative which doesn't clear the aggregates: compare the source of truth to the aggregate table. You can use db.query("mytable").paginate() on your Convex table and aggregate.paginate() on the aggregate. Update the aggregates based on the diff of these two paginated data streams.
Convex
Aggregate
Keep track of sums and counts in a denormalized and scalable way.
lee
lee5w ago
They do work, but i agree they're not discoverable https://www.convex.dev/components/aggregate#repair-incorrect-aggregates
Convex
Aggregate
Keep track of sums and counts in a denormalized and scalable way.
burnstony#1975
burnstony#1975OP5w ago
thanks @lee I'm confused where the aggregate is, where is the data stored? If I rename it, how do I no the old one is gone? @lee if I move it fromone file to a different file is it a different aggregate
erquhart
erquhart5w ago
In your project dashboard in the data view, you'll see a dropdown above the tables list. "App" is selected by default, but you can select a component from that list. If you select your Aggregate component you'll see it's tables. I'm not certain of this, but I believe moving aggegate related code around doesn't change the underlying data, as the component is configured centrally in convex.config.ts. If you rename it I suspect you'll start a new aggregate. Again, haven't tried this, but I believe if you do this, the old one will show in the dropdown as a greyed out "unmounted" component. You can't do anything with data from an unmounted component, but you can delete the unmounted component from project settings in the Components section.
lee
lee5w ago
All that ^ is correct. Also note that if you call new TableAggregate(components.myaggregate, ...) multiple times from different places, it's referring to the same aggregate. You can create multiple aggregates in convex.config.ts with different names, and then they would be instantiated separately with different new TableAggregate(components.<name>) calls
burnstony#1975
burnstony#1975OP3w ago
Thank you

Did you find this page helpful?