Baris•12mo ago

How can I sort a query by a field in the referenced document?

Hi, In one of my tables, there is a field named user_id that holds a user document’s ID, and in the user table, there is a string name field that holds the user’s name. I want to sort the first table by the referenced user’s name, but this doesn’t seem possible with indexes. The only thing I could do seems to be fetching all documents and sorting them with JavaScript, but this isn’t feasible because of the 16 k document scan limit. I want to do this with pagination. How can I achieve it?

16 Replies

ballingt•12mo ago

You'll want to put an index on the user's name (on the users table) and paginate over that. For each user, grab all the documents associated with them — so you'll also need an index on user_id on that other table. Does that part make sense? How many documents might a user have associated with them?

BarisOP•12mo ago

I understood the concept theoretically. There is no upper limit on either table, so there could easily be tens of thousands of documents… Maybe something like linked pagination could be done, like triggering the transition to the next user page after the users on the first query are finished… I don’t know. Also, in this case, I need to make a query for each user, which means 100 queries to show 100 results on one page.

ballingt•12mo ago

You can fetch more than one user in each paginated query, e.g. fetch 10 users and get all of their documents in each page of the paginated query. There are more complicated pagination patterns we've designed and decided not to release because they're not quite right yet — certainly we see the utitility of e.g. having two cursors makes sense here, one for the current user and one for the current document of that user. But if these are both unbounded you can denormalize: store the sort key (the username) on each document so you can create an index on it. The downside is the requirement that this stay up to date: every mutation which changes a user's name now needs to modify every document they own.

BarisOP•12mo ago

Yes, that’s correct. Unfortunately, a user can be referenced by 1000 documents. And I have to change all of them with every name change. Also I need 4-5 other fields as well. I cannot add all of them to every document and keep them updated. I feel like there must be another solution for a simple problem like this. Anyway, thanks for the help Tom 🙏 Does anyone have any other suggestions for this? Currently, I'm retrieving all the documents from two tables, and I'm combining them with JavaScript like an SQL join. But to display 100 results, I'm querying 10,000 items…

lee•12mo ago

i would do this suggestion of Tom's:

But if these are both unbounded you can denormalize: store the sort key (the username) on each document so you can create an index on it. The downside is the requirement that this stay up to date: every mutation which changes a user's name now needs to modify every document they own.

denormalizing is a pretty standard way of getting the field into the table so you can order by it. This is how I would do it in a database like postgres or mysql, for example.

BarisOP•12mo ago

I guess you meant to say that you wouldn’t do it this way in SQL? Since table joins can be done in SQL, there is no need to denormalize. If I understood your suggestion for doing it Convex-style correctly, in a scenario where I have a ‘task’ and ‘assignee’ table: I should save the assignee to each task as ‘assignee_id’ and ‘assignee_name’ and make mutations in thousands of tasks whenever an assignee’s name is edited. Is this the standart accepted way of doing it?

jamwt•12mo ago

table joins can be done in convex too sql databases are doing nothing different here -- if you want a fast sort, you need an index to Lee's point, a high performance version of this would always denormalize and then index the denormalized field, including in e.g. postgresql so you don't need to pull all the records into memory to sort them on a table scan (where "you" would be the database server in the case of a complex SQL statement)

BarisOP•12mo ago

I understand, thank you for the answer, Jamie. I think I need to step away from the SQL logic a bit and adjust my schemas accordingly.

Michal Srb•12mo ago

@Baris if possible, I'd be curious what the actual product use case is (what does this feature look like to the end user?)

BarisOP•12mo ago

We are working on a work management & collaboration tool, a product that will compete with Notion and Asana. Naturally, users should be able to sort their tasks by assignee. And in a project where a maximum of 10,000 tasks are allowed, we still haven't found a way to sort all the tasks without collecting them all first. In many ways, Convex could be the best backend service we've tried so far, but such shortcomings are making us reconsider returning to SQL. Of course, these issues are not unique to Convex but are limitations of document-based databases.

Michal Srb•12mo ago

Convex works very much like SQL does under the hood (in its ideal state, the performance right now might not be optimal, as Convex is new and we have many optimizations ahead of us). The one difference you might run into is that Convex enforces your code is good for OLTP workloads. This is why we limit query execution to 1 second. If your users need to wait more than 1 second, it's probably no longer a realtime workload. Thanks for providing the use case. Looking at it from a UX perspective, having a list of 10000 tasks sorted by assignee is probably not helpful. I imagine that either: - The list of tasks is relatively short (say 100), and then it makes sense to sort (group) them by assignee - The list of tasks is actually long, in that case filtering by assignee is probably what the user needs. So given this, your concern might be mostly that the UI doesn't break when the list is long but the user chooses to sort by assignee. Given how the list is produced (is it a search result? or listing in a project?), this could be a challenge. One thing you could do, if you can determine the list of assignees without loading all the tasks, is to then sort the list of assignees and paginate through their tasks. I think this is the main difference between Convex and trandional SQL database right now, the traditional SQL database will let you throw any workload at it, and this might lead to extremely slow performance for the end user. Whereas Convex right now makes you think about what will happen when the data scales, and address it. I would say that denormalization is still an option in this case. User renames will be infrequent, and you can implement them non-transactionally so that they scale over many tasks assigned to a single user.

BarisOP•12mo ago

To clarify, we don't want to display 10 thousand tasks. Only 100 tasks will be paginated, but we need the entire list to sort them. That's why I mentioned that. We thought about creating a separate table named task_assignee, but it doesn't change anything. ctx.db.get allows a maximum of 4 thousand runs. Also, a task can be in multiple projects. There are many other little things like this.

Michal Srb•12mo ago

You're not gonna show 10k tasks on a single screen, but you're trying to paginate through it, ordered by assignee. So for the user, that would mean scrolling through 400 tasks assigned to Baris, then another 600 assigned Michal, then a 1000 to Presley, and so on. Or 2 to Baris, 2 to BarisB, 4 to Michal, 3 to Nina, and so on for a thousand+ team members. Right? Either way doesn't seem to be a very useful UI.

BarisOP•12mo ago

It’s a matter of preference, and we offer endless possibilities for selecting how to view your tasks. Just because it doesn’t align with your preferences doesn’t mean it’s a ‘useless UI’ Thank you once again for your time and support, Michal. After considerations, we decided to denormalize all sortable fields, and it turned out really well. 🙏

v•12mo ago

When in doubt denormalize it out

jamwt•12mo ago

re: "useless UIs," agree there is a true Convex feature gap right now, especially for reporting kinds of interfaces where users are used to waiting a bit longer for a render. it's true there are times you really want to query things in a completely open fashion. right now, you need to stream convex out to e.g. a data warehouse for those types of queries. that works, but it's heavyweight. we do have plans to ship a lightweight OLAP engine on convex (probably based on duckdb or clickhouse or something), just built-in, which I suspect would be about what you would want for these use cases, if you want to display things in arbitrary orders with arbitrary predicates. so yeah, one day there will be a SQL engine you can use for arbitrary queries, with the following caveats: (1) it will be slightly (few seconds)? delayed behind your transactional data; and (2) it can't be used as part of the transactional layer--it is SELECT only there are definitely apps that have one kind of dashboard-y interface that want this from us, so we want to be able to help more here without asking you all to integrate a whole 'nother system. I would guess us shipping this is 3-4 months away. but we haven't firmed up what roadmaps look like in Aug/Sep/Oct yet, so we'll have updates soon

How can I sort a query by a field in the referenced document?

Did you find this page helpful?