pez
pez•2w ago

Fetching a random subset of documents without having to collect() over the entire set?

Hi team, I'm implementing an app where a core user flow is to be able to answer a subset of questions from a collection of documents I have stored in Convex. Right now, I have it implemented in a way where I fetch ALL of the documents (this can bloat to thousands eventually), and then apply filters related to which questions I'd like to exclude, and then return only a random subset of these documents. This makes my db bandwidth usage skyrocket over time and I'd really like to optimize this. Is there any simpler way to get a random subset of docs from Convex? If not, is there a more efficient way of implementing this in general? I've attached a code snippet below.
export const getRandomQuestions = query({
args: {
quantity: v.number(),
exclusionIds: v.optional(v.array(v.id("templateQuestions"))),
},
handler: async (ctx, args) => {
const { exclusionIds, quantity } = args;

const questions = await ctx.db.query("templateQuestions").collect();
const filteredQuestions = exclusionIds
? questions.filter((q) => !exclusionIds.includes(q._id))
: questions;

// Use current timestamp as seed for reproducible randomness within same millisecond
const seed = Date.now();
const random = mulberry32(seed);
for (let i = filteredQuestions.length - 1; i > 0; i--) {
const j = Math.floor(random() * (i + 1));

// Modern swap syntax
[filteredQuestions[i], filteredQuestions[j]] = [
filteredQuestions[j],
filteredQuestions[i],
];
}

return filteredQuestions.slice(0, quantity);
},
});
export const getRandomQuestions = query({
args: {
quantity: v.number(),
exclusionIds: v.optional(v.array(v.id("templateQuestions"))),
},
handler: async (ctx, args) => {
const { exclusionIds, quantity } = args;

const questions = await ctx.db.query("templateQuestions").collect();
const filteredQuestions = exclusionIds
? questions.filter((q) => !exclusionIds.includes(q._id))
: questions;

// Use current timestamp as seed for reproducible randomness within same millisecond
const seed = Date.now();
const random = mulberry32(seed);
for (let i = filteredQuestions.length - 1; i > 0; i--) {
const j = Math.floor(random() * (i + 1));

// Modern swap syntax
[filteredQuestions[i], filteredQuestions[j]] = [
filteredQuestions[j],
filteredQuestions[i],
];
}

return filteredQuestions.slice(0, quantity);
},
});
25 Replies
Convex Bot
Convex Bot•2w ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
jamwt
jamwt•2w ago
hi! you can use the aggregates component to fetch random documents in O(log n) time and without reading the whole table. Here's the component: https://www.convex.dev/components/aggregate
Convex
Aggregate
Keep track of sums and counts in a denormalized and scalable way.
jamwt
jamwt•2w ago
here's an example of it in use, to shuffle songs: https://github.com/get-convex/aggregate/blob/main/example/convex/shuffle.ts
GitHub
aggregate/example/convex/shuffle.ts at main · get-convex/aggregate
Component for aggregating counts and sums of Convex documents - get-convex/aggregate
pez
pezOP•2w ago
@jamwt oooh cool, will try that! @jamwt this looks great, but I'd also like to insert a step before (or after) the fetching of random indices to ensure that I'm not fetching any ids within the list of exclusionIds I provide (essentially don't want ot show users questions they've already answered). Is there a way to support that out of the box? thanks for the speedy replies btw! convex is great
jamwt
jamwt•2w ago
if the list of questions is stable, you can just keep using the same seed and move the offset along and you'll know you won't show the same question again if the list of questions is dynamic, things get trickier...
pez
pezOP•2w ago
hmm, the list of questions can change over time (we add more on a weekly basis). they aren't expected to change too often though
and you'll know you won't show the same question again
how would i know this?
jamwt
jamwt•2w ago
if you're paging through the questions in a stable order, and each question only exists once in the shuffled list, then (until you hit the end of the list and wrap around?) you wouldn't show the same question again this is actually the behavior the "shuffle songs" example does
pez
pezOP•2w ago
ah wait i think i may have miscommunicated. the idea here is that users will have answered questions on previous days (this is a daily question challenge app), and each day we give them a random list of 30 questions they can mouse through (but answer only 3 of them). I just wanna ensure that if a user has answered a question already (I have these ids), I don't show those as an option to the user again. I'm using a list of exclusion ids for this.
export const getRandomTemplateQuestionsNew = query({
args: {
offset: v.number(),
quantity: v.number(),
seed: v.string(),
exclusionIds: v.optional(v.array(v.id("templateQuestions"))),
},
handler: async (ctx, { offset, quantity, seed, exclusionIds }) => {
const count = await randomize.count(ctx);

const rand = new Rand(seed);
const allIndexes = Array.from({ length: count }, (_, i) => i);
shuffle(allIndexes, rand);
const indexes = allIndexes.slice(offset, offset + quantity);

const atIndexes = await Promise.all(
indexes.map((i) => randomize.at(ctx, i))
);

return await Promise.all(
atIndexes.map(async (atIndex) => {
const doc = (await ctx.db.get(atIndex.id))!;
return doc;
})
);
},
});
export const getRandomTemplateQuestionsNew = query({
args: {
offset: v.number(),
quantity: v.number(),
seed: v.string(),
exclusionIds: v.optional(v.array(v.id("templateQuestions"))),
},
handler: async (ctx, { offset, quantity, seed, exclusionIds }) => {
const count = await randomize.count(ctx);

const rand = new Rand(seed);
const allIndexes = Array.from({ length: count }, (_, i) => i);
shuffle(allIndexes, rand);
const indexes = allIndexes.slice(offset, offset + quantity);

const atIndexes = await Promise.all(
indexes.map((i) => randomize.at(ctx, i))
);

return await Promise.all(
atIndexes.map(async (atIndex) => {
const doc = (await ctx.db.get(atIndex.id))!;
return doc;
})
);
},
});
jamwt
jamwt•2w ago
gotcha. yeah, I think you'll need to keep maintaining your exclusion list then
pez
pezOP•2w ago
cool, would this random aggregate approach help me limit db bandwidth usage though? just wanna make sure I'm not sending back and forth the entire list each time :/
jamwt
jamwt•2w ago
yes, it would still help. you can fetch a random 30 + N records, where you have N excluded, and then just take the first 30 that aren't excluded to make the exclusion list not grow without bound, if you "EOL" some questions (after a month? year?) you could also run migrations to remove those ids from the exclusion lists since it's no longer needed. but you probably don't have to worry about that for a long time, if this is a daily question 🙂
pez
pezOP•2w ago
solid idea! lemme try that sir btw, just applied for the YC deal and have been loving my experience so far
jamwt
jamwt•2w ago
which company is this?
pez
pezOP•2w ago
thank you for building this
jamwt
jamwt•2w ago
no problem, glad to have you on. keep the feedback coming!
pez
pezOP•2w ago
we were building www.shopencore.ai during the batch
jamwt
jamwt•2w ago
ah yep... welcome!
pez
pezOP•2w ago
but recently pivoted to https://trycandle.app
candle | the app designed to be kept.
Candle is an app for modern couples looking to stay connected and grow closer.
pez
pezOP•2w ago
haha thank you! feel free to download the app too, we just launched would love feedback!
jamwt
jamwt•2w ago
nice. as someone in a 29 year relationship, very cool to see building happening in this space.
pez
pezOP•2w ago
i am in a 2 year one :p but would love feedback from you (seriosuly) havent talked to too many ppl in very long term relationships
jamwt
jamwt•2w ago
for sure, I usually try convex apps if I'm allowed in!
pez
pezOP•2w ago
would be a valuable perspective
jamwt
jamwt•2w ago
so I'll do it
pez
pezOP•2w ago
you're for sure allowed in!!! hey Jamie, noticing that randomize.count(ctx) is returning 0 for some reason. I did the following:
const randomize = new TableAggregate<{
DataModel: DataModel;
TableName: "templateQuestions";
Key: null;
}>(components.templateQuestions, {
sortKey: () => null,
});
const randomize = new TableAggregate<{
DataModel: DataModel;
TableName: "templateQuestions";
Key: null;
}>(components.templateQuestions, {
sortKey: () => null,
});
at the top of my functions file. and added
import { defineApp } from "convex/server";
import aggregate from "@convex-dev/aggregate/convex.config";

const app = defineApp();
app.use(aggregate, { name: "templateQuestions" });

export default app;
import { defineApp } from "convex/server";
import aggregate from "@convex-dev/aggregate/convex.config";

const app = defineApp();
app.use(aggregate, { name: "templateQuestions" });

export default app;
to convex.config.ts

Did you find this page helpful?