Florian
Florian3mo ago

Vector Search with relational filter

Given this schema, how do I only retrieve embeddings for notes that belong to this particular user? Do I need to put the userId into the embedding table as well?
import { authTables } from "@convex-dev/auth/server";
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

const schema = defineSchema({
...authTables,
notes: defineTable({
title: v.string(),
body: v.string(),
userId: v.id("users"),
}).index("by_userId", ["userId"]),

embeddings: defineTable({
content: v.string(),
embedding: v.array(v.float64()),
noteId: v.id("notes"),
})
.index("by_noteId", ["noteId"])
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 1536,
filterFields: ["noteId"],
}),
});

export default schema;
import { authTables } from "@convex-dev/auth/server";
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

const schema = defineSchema({
...authTables,
notes: defineTable({
title: v.string(),
body: v.string(),
userId: v.id("users"),
}).index("by_userId", ["userId"]),

embeddings: defineTable({
content: v.string(),
embedding: v.array(v.float64()),
noteId: v.id("notes"),
})
.index("by_noteId", ["noteId"])
.vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 1536,
filterFields: ["noteId"],
}),
});

export default schema;
10 Replies
Convex Bot
Convex Bot3mo ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
erquhart
erquhart3mo ago
Embeddings tables are usually mapped over by the parent table, eg., you get the user's notes and map over embeddings for the notes. But if you don't want the notes and just the embeddings in a query, adding the userId to the embeddings table is the best way. But I'd generally recommend treating an embeddings table as an extension of the data it represents. The point of having a separate embeddings table is to allow the non-embedding data to be pulled down without the embeddings for bandwidth efficiency.
Florian
FlorianOP3mo ago
How would I change my schema to do that? Considering that I split notes into text chunks before embedding them, so there is a 1-to-many relationship.
erquhart
erquhart3mo ago
Ah you’re just searching, I don’t know why I thought you were querying embeddings directly apart from search. At any rate, yeah you’ll want to add the user id to the embedding and filter on that in your search query.
Florian
FlorianOP3mo ago
So the userId has to be in both the note and the note embedding? i.e.
notes: defineTable({
title: v.string(),
body: v.string(),
userId: v.id("users"),
}).index("by_userId", ["userId"]),

embeddings: defineTable({
content: v.string(),
embedding: v.array(v.float64()),
noteId: v.id("notes"),
userId: v.id("users"),
})
notes: defineTable({
title: v.string(),
body: v.string(),
userId: v.id("users"),
}).index("by_userId", ["userId"]),

embeddings: defineTable({
content: v.string(),
embedding: v.array(v.float64()),
noteId: v.id("notes"),
userId: v.id("users"),
})
That kind of duplication feels a bit ugly compared to relational databases
jamwt
jamwt3mo ago
as convex is a relational database (and just runs on top of one), would you be willing to show the postgres or equivalent schema you'd use on those system? it would help me understand what you're not getting with convex right now
Florian
FlorianOP3mo ago
With Prisma, I would do something like this:
prismadb.embeddings.findMany(
{
where: {
note: {
userId
}
}
})
prismadb.embeddings.findMany(
{
where: {
note: {
userId
}
}
})
Not sure what SQL query this translates to. But the userId is only stored in the note and not in the embeddings. @jamwt Can you help me with this? I'm preparing a tutorial for YouTube but I'm stuck here What I need is a relation query
erquhart
erquhart3mo ago
Adding userId to both tables is how you do a relation query here. It seems clunky but that's just because Convex api's are low level. Prisma is doing something very similar under the hood, and providing findMany through their orm. Honestly this is probably a good chance to point this low-level aspect to users in your video, it comes up in a number of places. Convex isn't an orm, but an orm could be built on top of it. Ents (in maintenance mode) is a good example of this: https://labs.convex.dev/convex-ents This "low level" concept is mentioned explicitly in the Architecture section of Zen of Convex: https://docs.convex.dev/understanding/zen
Florian
FlorianOP3mo ago
Thank you for the clarification! That's a good idea to mention it! I'm fine with the double userId, I just wanted to make sure I'm not missing anything
erquhart
erquhart3mo ago
It's a bit disorienting for sure. But yeah, a little denormalization can go a long way with Convex.

Did you find this page helpful?