jamalsoueidan
jamalsoueidan•2mo ago

Search for subdomains in string field

Hi I have encountered an issue with the full-text search functionality. When searching for emails by subdomain, the system seems to remove special characters (like @ and .) and splits the subdomain into multiple words. This behaviour results in incorrect matches and makes it impossible to accurately find all emails with a specific subdomain. For example: Searching for @email.norwegianreward.com incorrectly splits it into email, norwegianreward and com This is problematic for use cases where exact subdomain matches are required. Accurate full-text search for subdomains is a very basic and essential feature for applications handling email data. Could you advise on how to resolve this issue or if there are any planned updates to improve the full-text search capabilities? search: { from: "@email.norwegianreward.com" } results: [ { From: "newsletter@email.norwegianreward.com" }, { From: "noreply@email.openai.com" }, //not interested { From: "info@email.home.saxo" },//not interested { From: "info@email.home.saxo" },//not interested ]
12 Replies
Convex Bot
Convex Bot•2mo ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
jamalsoueidan
jamalsoueidanOP•2mo ago
I tried to use import { filter } from "convex-helpers/server/filter", to mix with withSearchIndex didn't work:
// doesnt work fulltext split words
export const list = query({
args: { from: v.string() },
handler: (ctx, args) => {
return ctx.db
.query("emails")
.withSearchIndex("search_from", (q) => q.search("From", args.from))
.take(30);
},
});

// doesnt work since it will only look at the first 10 documents, doesnt do full table scan
export const search = query({
args: { from: v.string(), paginationOpts: paginationOptsValidator },
handler: async (ctx, args) => {
return filter(ctx.db.query("emails"), (post) => {
return post.From.includes(args.from);
}).paginate(args.paginationOpts);
},
});

// mix doesnt work since we cant use withSearchIndex with filter
export const mix = query({
args: { from: v.string(), paginationOpts: paginationOptsValidator },
handler: async (ctx, args) => {
return filter(ctx.db
.query("emails")
.withSearchIndex("search_from", (q) => q.search("From", args.from)), (post) => {
return post.From.includes(args.from);
}).paginate(args.paginationOpts);
},
});
// doesnt work fulltext split words
export const list = query({
args: { from: v.string() },
handler: (ctx, args) => {
return ctx.db
.query("emails")
.withSearchIndex("search_from", (q) => q.search("From", args.from))
.take(30);
},
});

// doesnt work since it will only look at the first 10 documents, doesnt do full table scan
export const search = query({
args: { from: v.string(), paginationOpts: paginationOptsValidator },
handler: async (ctx, args) => {
return filter(ctx.db.query("emails"), (post) => {
return post.From.includes(args.from);
}).paginate(args.paginationOpts);
},
});

// mix doesnt work since we cant use withSearchIndex with filter
export const mix = query({
args: { from: v.string(), paginationOpts: paginationOptsValidator },
handler: async (ctx, args) => {
return filter(ctx.db
.query("emails")
.withSearchIndex("search_from", (q) => q.search("From", args.from)), (post) => {
return post.From.includes(args.from);
}).paginate(args.paginationOpts);
},
});
Any ideas how to solve this? select * from emails where email like '%@email.norewgiean.com'
lee
lee•2mo ago
can you elaborate on what breaks when you do
export const search = query({

args: { from: v.string() },

handler: async (ctx, args) => {

return filter(ctx.db.query("emails"), (post) =>

post.From.includes(args.from)

).first();

},

});
export const search = query({

args: { from: v.string() },

handler: async (ctx, args) => {

return filter(ctx.db.query("emails"), (post) =>

post.From.includes(args.from)

).first();

},

});
(this is the code from your support thread, which is slightly different from the code you pasted here)
jamalsoueidan
jamalsoueidanOP•2mo ago
@lee That would work, since its going to do full scan, but if I use paginate, it will only go through first few documents, and thats where it fails 🙂 I am working on a system that manages a list of emails and groups them into collections. Collections and Rules - Collections: These are groups that contain specific rules to determine which emails belong to them. - Rules: A rule can either specify: - A full email address (e.g., user@example.com) or - A subdomain or part of an email address (e.g., @example.com). Workflow 1. User Interaction: In the UI, the user clicks on a collection to view all emails belonging to it. 2. Retrieval: I need to search the emails table, using the rules defined in the selected collection to retrieve matching emails. Right now im doing something like this:
const email = await ctx.db
.query("emails")
.filter((q) => {
const emailConditions = collection.emails.map((email) =>
q.eq(q.field("from"), email)
);

const emailDomain = collection.emails.map((email) =>
q.eq(q.field("domain"), email)
);

return q.or(...emailConditions, ...emailDomain);
})
.order("desc")
.paginate(args.paginationOpts);
const email = await ctx.db
.query("emails")
.filter((q) => {
const emailConditions = collection.emails.map((email) =>
q.eq(q.field("from"), email)
);

const emailDomain = collection.emails.map((email) =>
q.eq(q.field("domain"), email)
);

return q.or(...emailConditions, ...emailDomain);
})
.order("desc")
.paginate(args.paginationOpts);
erquhart
erquhart•2mo ago
If you have a limited set of specific criteria, which it seems like you might, you could use calculated fields, such as a field that only contains the @ symbol and domain, and then index against that
lee
lee•5w ago
here's an example of keeping a calculated field https://stack.convex.dev/triggers#denormalizing-a-field . you could use a trigger or just keep it updated whenever you insert/patch/replace the email field.
Database Triggers
Triggers automatically run code whenever data in a table changes. A library in the convex-helpers npm package allows you to attach trigger functions t...
jamalsoueidan
jamalsoueidanOP•5w ago
I can not use index, as you can see in my code example its O R ! At first I tried with withSearch, didn't work since it split asd@test.blah.com into 3 words and give me all emails that have any word, didnt work. then I tried to split the email in 2 fields (nickname,domain) asd as nickname, and test.blah.com as domain, and just used filter, i'm afraid this would be very slow when i have alot of emails.
lee
lee•5w ago
Can you elaborate on why you can't do an index? You can do an OR with multiple queries
jamalsoueidan
jamalsoueidanOP•5w ago
@lee because the rules are dynamic, and is or...if you im wrong, please show me code example, many thanks
lee
lee•5w ago
Do you need to paginate over all matches or are you just finding the first result? And how are you using the query, in a reactive useQuery or something else?
jamalsoueidan
jamalsoueidanOP•5w ago
I need to use paginate for the results.
lee
lee•5w ago
There are two main paths you can take: 1. The slow path, equivalent to what you would get in a sql db. You can do this with filter from convex-helpers. You said it doesn't work, but I still don't understand why not. It may return small or empty pages, but as long as you keep calling loadMore, everything should eventually work. 2. The efficient, indexed route (as suggested by @erquhart ). This is more complicated but boils down to doing multiple db.query().withIndex().paginate() calls and merging the results.