hasanaktasTR
hasanaktasTR4mo ago

Big dataset not in filter

I have asked about this topic before but there are things I can't figure out. I am setting up a structure similar to tinder. I keep the reactions of the users and the user id they reacted to in a table. My flow is working now but it is obvious that it will have problems with large data. (We can think of a scenario where 1 user has 100 thousand reactions and the total number of profiles is 1 million). When there is no more people to show this query to the client on the client side, the request is thrown. Therefore, it is not affected by table changes. It works like a rest api. I have read all the articles but I cannot find a solution where I can use the index for this scenario. I have to filter, which leads to a full table search. I am putting the sample code below. I would be glad if you help me.
18 Replies
hasanaktasTR
hasanaktasTROP4mo ago
export const getExploreAccounts = protectedProfileQuery({
args: {gender: gender, ageRange: ageRange},
handler: async (ctx, args) => {
const currentYear = new Date().getFullYear();

const minAge = args.ageRange[0];

const maxAge = args.ageRange[1];

const reactions = await ctx.db.query("reactions").withIndex("by_userId", (q) => q.eq("userId", ctx.profile.userId)).collect();

const notInUserIds = [ctx.profile.userId,...reactions.map((reaction) => reaction.targetUserId)];

let profiles: Doc<"profiles">[] = [];

const getProfilesByH3Index = async (h3Index: string) => {
return await ctx.db.query("profiles").withIndex("by_h3Index_gender_birthDate", (q) => {
let filter;

filter = q.eq("location.h3Index", h3Index).eq("gender", args.gender);

if (maxAge !== 60) {
filter = filter.gte("birthDate",formatISO(new Date(currentYear - maxAge, 0, 1)))
}

if (minAge !== 18) {
filter = filter.lte("birthDate",formatISO(new Date(currentYear - minAge, 0, 1)))
}

return filter;
})
.filter((q) =>q.and(...notInUserIds.map((id) => q.neq(q.field("userId"), id))))
.take(config.exploreProfileCount);
};
const h3Indexes = gridDiskDistances(ctx.profile.location.h3Index,config.exploreGridDiskSize,).flat();

for (const h3Index of h3Indexes) {
const result = await getProfilesByH3Index(h3Index);
profiles = [...profiles, ...result];
if (profiles.length >= config.exploreProfileCount) {
break;
}
}
return profiles
},
});
export const getExploreAccounts = protectedProfileQuery({
args: {gender: gender, ageRange: ageRange},
handler: async (ctx, args) => {
const currentYear = new Date().getFullYear();

const minAge = args.ageRange[0];

const maxAge = args.ageRange[1];

const reactions = await ctx.db.query("reactions").withIndex("by_userId", (q) => q.eq("userId", ctx.profile.userId)).collect();

const notInUserIds = [ctx.profile.userId,...reactions.map((reaction) => reaction.targetUserId)];

let profiles: Doc<"profiles">[] = [];

const getProfilesByH3Index = async (h3Index: string) => {
return await ctx.db.query("profiles").withIndex("by_h3Index_gender_birthDate", (q) => {
let filter;

filter = q.eq("location.h3Index", h3Index).eq("gender", args.gender);

if (maxAge !== 60) {
filter = filter.gte("birthDate",formatISO(new Date(currentYear - maxAge, 0, 1)))
}

if (minAge !== 18) {
filter = filter.lte("birthDate",formatISO(new Date(currentYear - minAge, 0, 1)))
}

return filter;
})
.filter((q) =>q.and(...notInUserIds.map((id) => q.neq(q.field("userId"), id))))
.take(config.exploreProfileCount);
};
const h3Indexes = gridDiskDistances(ctx.profile.location.h3Index,config.exploreGridDiskSize,).flat();

for (const h3Index of h3Indexes) {
const result = await getProfilesByH3Index(h3Index);
profiles = [...profiles, ...result];
if (profiles.length >= config.exploreProfileCount) {
break;
}
}
return profiles
},
});
jamalsoueidan
jamalsoueidan4mo ago
I'm not sure what you are asking, does filter not work for you? or the function is not reactive?
hasanaktasTR
hasanaktasTROP4mo ago
The flow works as I want it to. The problem is that as the number of reactions and profiles increases, query performance will decrease significantly with the current structure. I couldn't find a solution. So I wanted to get some ideas.
jamalsoueidan
jamalsoueidan4mo ago
Are you testing against 100k rows?
hasanaktasTR
hasanaktasTROP4mo ago
I haven't tested it, but when I use a filter, it will perform a full table search, so the query performance will decrease as the number of profiles increases. But I can prepare for such a scenario. I will test it during the week.
hasanaktasTR
hasanaktasTROP4mo ago
@jamalsoueidan I was able to quickly reproduce the documents. Unfortunately, it was as I expected. The query is now giving an error. Function execution timed out (maximum duration: 1s)
No description
No description
hasanaktasTR
hasanaktasTROP4mo ago
When I reduced the h3 index count from 37 to 19, the query responded in an average of 3.5 to 4 seconds. Instead of sending the queries one by one, I tried sending all the queries at the same time with Promise.all, and it still gave a timeout error.
No description
No description
ballingt
ballingt4mo ago
I have read all the articles but I cannot find a solution where I can use the index for this scenario. I have to filter, which leads to a full table search.
Can you say more here, why do you need to filter?
hasanaktasTR
hasanaktasTROP4mo ago
Schema
const reactions = defineTable({
userId: v.id("users"),
targetUserId: v.id("users"),

type: v.union(
v.literal("like"),
v.literal("dislike"),
),
})
.index("by_userId", ["userId"])
.index("by_targetUserId_userId", ["targetUserId", "userId"]);


const profiles = defineTable({
userId: v.id("users"),
gender: v.union(v.literal("male"), v.literal("female")),
birthDate: v.string(),
name: v.string(),
countryCode: v.optional(v.string()),
isInternal: v.boolean(),
h3Index: v.optional(v.string()),
location: v.optional(
v.object({
latitude: v.number(),
longitude: v.number(),
}),
),
})
.index("by_userId", ["userId"])
.index("by_isInternal_h3Index_gender_birthDate", [
"isInternal",
"h3Index",
"gender",
"birthDate",
])
.index("by_isInternal_gender_birthDate", [
"isInternal",
"gender",
"birthDate",
]);
const reactions = defineTable({
userId: v.id("users"),
targetUserId: v.id("users"),

type: v.union(
v.literal("like"),
v.literal("dislike"),
),
})
.index("by_userId", ["userId"])
.index("by_targetUserId_userId", ["targetUserId", "userId"]);


const profiles = defineTable({
userId: v.id("users"),
gender: v.union(v.literal("male"), v.literal("female")),
birthDate: v.string(),
name: v.string(),
countryCode: v.optional(v.string()),
isInternal: v.boolean(),
h3Index: v.optional(v.string()),
location: v.optional(
v.object({
latitude: v.number(),
longitude: v.number(),
}),
),
})
.index("by_userId", ["userId"])
.index("by_isInternal_h3Index_gender_birthDate", [
"isInternal",
"h3Index",
"gender",
"birthDate",
])
.index("by_isInternal_gender_birthDate", [
"isInternal",
"gender",
"birthDate",
]);
helpers
const getProfilesByH3Index = async (h3Index: string) => {
const profiles = await ctx.db
.query("profiles")
.withIndex("by_isInternal_h3Index_gender_birthDate", (q) => {
let filter;

filter = q
.eq("isInternal", false)
.eq("h3Index", h3Index)
.eq("gender", args.gender);
if (maxAge !== 60) {
filter = filter.gte(
"birthDate",
formatISO(new Date(currentYear - maxAge, 0, 1)),
);
}

if (minAge !== 18) {
filter = filter.lte(
"birthDate",
formatISO(new Date(currentYear - minAge, 0, 1)),
);
}

return filter;
})
.filter((q) =>
q.and(...notInUserIds.map((id) => q.neq(q.field("userId"), id))),
)

.take(20);

return profiles;
};

const getInternalProfiles = async () => {
const internalProfiles = await ctx.db
.query("profiles")
.withIndex("by_isInternal_gender_birthDate", (q) => {
let filter;

filter = q.eq("isInternal", true).eq("gender", args.gender);

if (maxAge !== 60) {
filter = filter.gte(
"birthDate",
formatISO(new Date(currentYear - maxAge, 0, 1)),
);
}

if (minAge !== 18) {
filter = filter.lte(
"birthDate",
formatISO(new Date(currentYear - minAge, 0, 1)),
);
}

return filter;
})
.filter((q) =>
q.and(...notInUserIds.map((id) => q.neq(q.field("userId"), id))),
)

.take(20);

return internalProfiles;
};
const getProfilesByH3Index = async (h3Index: string) => {
const profiles = await ctx.db
.query("profiles")
.withIndex("by_isInternal_h3Index_gender_birthDate", (q) => {
let filter;

filter = q
.eq("isInternal", false)
.eq("h3Index", h3Index)
.eq("gender", args.gender);
if (maxAge !== 60) {
filter = filter.gte(
"birthDate",
formatISO(new Date(currentYear - maxAge, 0, 1)),
);
}

if (minAge !== 18) {
filter = filter.lte(
"birthDate",
formatISO(new Date(currentYear - minAge, 0, 1)),
);
}

return filter;
})
.filter((q) =>
q.and(...notInUserIds.map((id) => q.neq(q.field("userId"), id))),
)

.take(20);

return profiles;
};

const getInternalProfiles = async () => {
const internalProfiles = await ctx.db
.query("profiles")
.withIndex("by_isInternal_gender_birthDate", (q) => {
let filter;

filter = q.eq("isInternal", true).eq("gender", args.gender);

if (maxAge !== 60) {
filter = filter.gte(
"birthDate",
formatISO(new Date(currentYear - maxAge, 0, 1)),
);
}

if (minAge !== 18) {
filter = filter.lte(
"birthDate",
formatISO(new Date(currentYear - minAge, 0, 1)),
);
}

return filter;
})
.filter((q) =>
q.and(...notInUserIds.map((id) => q.neq(q.field("userId"), id))),
)

.take(20);

return internalProfiles;
};
Query
export const getExploreAccountsQuery = protectedProfileQuery({
args: {
gender: gender,
ageRange: ageRange,
},
handler: async (ctx, args) => {
const currentYear = new Date().getFullYear();
const minAge = args.ageRange[0];
const maxAge = args.ageRange[1];

const reactions = await ctx.db
.query("reactions")
.withIndex("by_userId", (q) => q.eq("userId", ctx.profile.userId))
.collect();

const notInUserIds = [
ctx.profile.userId,
...reactions.map((reaction) => reaction.targetUserId),
];

let profiles: Doc<"profiles">[] = [];



if (ctx.profile.h3Index) {
const h3Indexes = gridDiskDistances(
ctx.profile.h3Index,
3,
).flat();

for (const h3Index of h3Indexes) {
const result = await getProfilesByH3Index(h3Index);
profiles = [...profiles, ...result];
if (profiles.length >= 20) {
break;
}
}
}

if (profiles.length < 20) {
const result = await getInternalProfiles();

profiles = [...profiles, ...result];
}

const accounts = profiles.map((profile) => {
return {
profile,
};
});

return accounts;
},
});
export const getExploreAccountsQuery = protectedProfileQuery({
args: {
gender: gender,
ageRange: ageRange,
},
handler: async (ctx, args) => {
const currentYear = new Date().getFullYear();
const minAge = args.ageRange[0];
const maxAge = args.ageRange[1];

const reactions = await ctx.db
.query("reactions")
.withIndex("by_userId", (q) => q.eq("userId", ctx.profile.userId))
.collect();

const notInUserIds = [
ctx.profile.userId,
...reactions.map((reaction) => reaction.targetUserId),
];

let profiles: Doc<"profiles">[] = [];



if (ctx.profile.h3Index) {
const h3Indexes = gridDiskDistances(
ctx.profile.h3Index,
3,
).flat();

for (const h3Index of h3Indexes) {
const result = await getProfilesByH3Index(h3Index);
profiles = [...profiles, ...result];
if (profiles.length >= 20) {
break;
}
}
}

if (profiles.length < 20) {
const result = await getInternalProfiles();

profiles = [...profiles, ...result];
}

const accounts = profiles.map((profile) => {
return {
profile,
};
});

return accounts;
},
});
reactions and profiles are kept in 2 separate tables. In each API request I need to fetch other 20 profiles for which the user has not received reactions. The flow is as follows: Get reactions of the user making the request from the database. If the user making the request has an h3Index - Create a certain number of other h3indexes around it. Make a separate database query request for each h3. I will query up to 37 adjacent h3Indexes. 1 hexagon 7 hexagon (1+6) 19 hexagon (1+6+12) 37 hexagon (1+6+12+18) Stop the request when the request results are 20. --------------- if the number of users found does not reach 20 - ​​Fetch 20 internal users. As I said, my problem is filtering out users who have not received reactions when sending requests to these tables. It's fine for small numbers of users, but I started getting errors for high-profile numbers like the ones I simulated above. Above are simple table definitions and my query. I tried to make the code as clean as possible for clarity. ------- Brief summary If it finds 20 people in the first queries, there is no problem. If it cannot find them and tries to make 37 queries in order, it gives an error. If I remove the notIn filter, the problem is solved. Function execution timed out (maximum duration: 1s)
jamalsoueidan
jamalsoueidan4mo ago
@Hmza
Hmza
Hmza4mo ago
the problem seems to be the loop. you should not use loop to call two seperate functions rather you should use join to do one query. and paginate if possible rather using collect() if you want to get all reactions. read more here about join: https://docs.convex.dev/database/reading-data pagination: https://stack.convex.dev/pagination
Take Control of Pagination
Convex offers robust control over pagination with a powerful function, getPage, enabling complex edge cases. In this article, we go over how to use th...
Hmza
Hmza4mo ago
@hasanaktasTR
hasanaktasTR
hasanaktasTROP4mo ago
@Hmza Are you sure about pagination? If you have examined the sample code, I am trying to bring a maximum of 20 profiles. If you are saying to send 37 requests on the client side, it is very difficult to stay under the cost of this. Can you explain in more detail? I send a request to the client and pull 20 profiles (not websocket, useConvex().query) Then, after the user gives a reaction to all 20, I send a request to the same service again and bring 20 new profiles.
hasanaktasTR
hasanaktasTROP4mo ago
I tried to show the scenario in the video. New profile requests are not made until 20 profile reactions are given. After 20 profile reactions are given, new users are pulled, these new profiles are selected from the group that has not been reacted
Hmza
Hmza4mo ago
so the reactions "user" gave to profiles are recorded? or once they re open the app it starts from 0 again ? Instead of querying the reactions table every time to filter out users, you could maintain a separate table of filtered profiles for each user. This table would be updated whenever a reaction is made, and you'd query it instead of the reactions table.
hasanaktasTR
hasanaktasTROP4mo ago
As the user reacts, I manually delete it from the screen and write it to the reaction table. If I used websocket, it would send a new request after each deletion. That's why I send an http request. When the list is finished, the reaction table is up to date, so the new list comes correctly.
Hmza
Hmza4mo ago
const filteredUsers = defineTable({
userId: v.id("users"),
filteredUserId: v.id("users"),
}).index("by_userId", ["userId"]);
const filteredUsers = defineTable({
userId: v.id("users"),
filteredUserId: v.id("users"),
}).index("by_userId", ["userId"]);
const filteredUserIds = await ctx.db
.query("filteredUsers")
.withIndex("by_userId", (q) => q.eq("userId", ctx.profile.userId))
const filteredUserIds = await ctx.db
.query("filteredUsers")
.withIndex("by_userId", (q) => q.eq("userId", ctx.profile.userId))
yeah no need to delete but as i said above if you are seperating the concern here you'd just query the filteredTable to get the users to show to the user. also you should use index with userId in the profiles table
.index("by_isInternal_h3Index_gender_birthDate_userId", [
"isInternal",
"h3Index",
"gender",
"birthDate",
"userId"
])
.index("by_isInternal_h3Index_gender_birthDate_userId", [
"isInternal",
"h3Index",
"gender",
"birthDate",
"userId"
])
to make it more efficient when querying.
jamalsoueidan
jamalsoueidan4mo ago
Thank you @Hmza 🔥

Did you find this page helpful?