Big dataset not in filter
I have asked about this topic before but there are things I can't figure out.
I am setting up a structure similar to tinder. I keep the reactions of the users and the user id they reacted to in a table.
My flow is working now but it is obvious that it will have problems with large data. (We can think of a scenario where 1 user has 100 thousand reactions and the total number of profiles is 1 million).
When there is no more people to show this query to the client on the client side, the request is thrown. Therefore, it is not affected by table changes. It works like a rest api.
I have read all the articles but I cannot find a solution where I can use the index for this scenario. I have to filter, which leads to a full table search.
I am putting the sample code below. I would be glad if you help me.
18 Replies
I'm not sure what you are asking, does filter not work for you? or the function is not reactive?
The flow works as I want it to.
The problem is that as the number of reactions and profiles increases, query performance will decrease significantly with the current structure. I couldn't find a solution. So I wanted to get some ideas.
Are you testing against 100k rows?
I haven't tested it, but when I use a filter, it will perform a full table search, so the query performance will decrease as the number of profiles increases.
But I can prepare for such a scenario. I will test it during the week.
@jamalsoueidan
I was able to quickly reproduce the documents. Unfortunately, it was as I expected. The query is now giving an error.
Function execution timed out (maximum duration: 1s)
When I reduced the h3 index count from 37 to 19, the query responded in an average of 3.5 to 4 seconds. Instead of sending the queries one by one, I tried sending all the queries at the same time with Promise.all, and it still gave a timeout error.
I have read all the articles but I cannot find a solution where I can use the index for this scenario. I have to filter, which leads to a full table search.Can you say more here, why do you need to filter?
Schema
helpers
Query
reactions and profiles are kept in 2 separate tables.
In each API request I need to fetch other 20 profiles for which the user has not received reactions.
The flow is as follows:
Get reactions of the user making the request from the database.
If the user making the request has an h3Index
- Create a certain number of other h3indexes around it.
Make a separate database query request for each h3.
I will query up to 37 adjacent h3Indexes.
1 hexagon
7 hexagon (1+6)
19 hexagon (1+6+12)
37 hexagon (1+6+12+18)
Stop the request when the request results are 20.
---------------
if the number of users found does not reach 20
- Fetch 20 internal users.
As I said, my problem is filtering out users who have not received reactions when sending requests to these tables.
It's fine for small numbers of users, but I started getting errors for high-profile numbers like the ones I simulated above.
Above are simple table definitions and my query.
I tried to make the code as clean as possible for clarity.
-------
Brief summary
If it finds 20 people in the first queries, there is no problem. If it cannot find them and tries to make 37 queries in order, it gives an error. If I remove the notIn filter, the problem is solved.
Function execution timed out (maximum duration: 1s)
@Hmza
the problem seems to be the loop. you should not use loop to call two seperate functions rather you should use join to do one query. and paginate if possible rather using collect() if you want to get all reactions.
read more here about join:
https://docs.convex.dev/database/reading-data
pagination:
https://stack.convex.dev/pagination
Reading Data | Convex Developer Hub
Query and
Take Control of Pagination
Convex offers robust control over pagination with a powerful function, getPage, enabling complex edge cases. In this article, we go over how to use th...
@hasanaktasTR
@Hmza
Are you sure about pagination? If you have examined the sample code, I am trying to bring a maximum of 20 profiles.
If you are saying to send 37 requests on the client side, it is very difficult to stay under the cost of this.
Can you explain in more detail?
I send a request to the client and pull 20 profiles (not websocket, useConvex().query)
Then, after the user gives a reaction to all 20, I send a request to the same service again and bring 20 new profiles.
I tried to show the scenario in the video. New profile requests are not made until 20 profile reactions are given.
After 20 profile reactions are given, new users are pulled, these new profiles are selected from the group that has not been reacted
so the reactions "user" gave to profiles are recorded? or once they re open the app it starts from 0 again ?
Instead of querying the reactions table every time to filter out users, you could maintain a separate table of filtered profiles for each user. This table would be updated whenever a reaction is made, and you'd query it instead of the reactions table.
As the user reacts, I manually delete it from the screen and write it to the reaction table. If I used websocket, it would send a new request after each deletion. That's why I send an http request. When the list is finished, the reaction table is up to date, so the new list comes correctly.
yeah no need to delete but as i said above if you are seperating the concern here you'd just query the filteredTable to get the users to show to the user.
also you should use index with userId in the profiles table
to make it more efficient when querying.
Thank you @Hmza 🔥