Varun Singh•17mo ago

Database Design: Intersection of three fields

This is more of a database design question for my Convex app than a Convex-specific question. I am building a platform for hosting and participating in hackathons, and I have three tables to store participation, in addition to the users table: - teams, which takes a competition ID - competition - participants, which takes a team ID and a user ID I designed it this way to adhere to database normalization, but I am trying to use Convex indexes to - quickly get the user's team for a particular competition - quickly get all the teams for a particular competition But I can't index teams with a user field, so I am stumped. Any ideas?

18 Replies

erquhart•17mo ago

My personal rule of thumb with convex is don't optimize for performance until there's a performance problem. So often I assume something will take too long and I'll optimize ahead of time, only to find out the unoptimized approach would have been fine. If you take the performance concern out of the mind, the right approach is usually straightforward. In your case, you can get the competition, get the users, and map over the users to retrieve their participants and teams. Get it working and then decide if you need to optimize, which often just means parallelizing things, and maybe fetching some reused data in advance before mapping/looping. More specific to your question: - add a user+competition index to the participants table, so retrieving the participant for a given user/competition is effectively a point query, which the convex team says they consider to be ~1ms for "back of the napkin math" - once you have the participant, getting the team by id is another pointer query, so another ~1ms - getting teams for a competition seems like a non-problem, you have the teams table indexed by competition id so query away

Varun SinghOP•17mo ago

That’s a good point. If in the future, it does need to be optimized to a point query, how would I add a user+competition index if the user isn’t a column? I wanted to do it this way but the way I designed it, I don’t know how to account for that even with a different design. The most efficient way I can use is the users since the number of competitions a user enters is going to be much lower than the total number of teams or participants.

jamwt•17mo ago

so, it looks like this is a junction (aka join) table, which is the right way to do membership (many-to-many) and you should absolutely be able to make those indexes on participants... can you say more about what didn't work?

Varun SinghOP•17mo ago

I know I can use a query on participants to get the rows that I need but I am wondering if I can do it with indexes so that it doesn’t require any iteration erquhart had suggested user+competition index which is what I wanted to do but I can’t when these two fields are in two different tables

jamwt•17mo ago

ah. yeah, you need iteration. a JOIN in a RDBMS also does this. it needs to fan out and do N point queries for every secondary key. each query is fast, but is at a different index location and is not adjacent. to speed this up, just run them in parallel right now in convex. this article has all the details: https://stack.convex.dev/functional-relationships-helpers

Functional Relationships: Helpers

In this post, we’ll look at some helper functions to help write code to traverse relationships in a readable, predictable, and debuggable way.

jamwt•17mo ago

https://stack.convex.dev/functional-relationships-helpers#many-to-many

Functional Relationships: Helpers

In this post, we’ll look at some helper functions to help write code to traverse relationships in a readable, predictable, and debuggable way.

jamwt•17mo ago

getManyVia

Varun SinghOP•17mo ago

I’ll take a look, thank you

erquhart•17mo ago

@Varun Singh I'll add the direct route in addition to the helpers that Jamie linked. You said you're trying to get the user's team for a particular competition, right? I assume that means you have a user id and a competition id handy for this query. You'll add an index to your participants table:

.index('by_user_competition', ['user', 'competition'])

.index('by_user_competition', ['user', 'competition'])

You can then query against that index to get the join:

const participant = await ctx.db
  .query('participants')
  .withIndex('by_user_competition', q =>
    q.eq('user', <user_id>).eq('competition', <competition_id>)
  )
  .unique()

const participant = await ctx.db
  .query('participants')
  .withIndex('by_user_competition', q =>
    q.eq('user', <user_id>).eq('competition', <competition_id>)
  )
  .unique()

Varun SinghOP•17mo ago

The problem with this approach is that the participants table does not have a competition column, only the teams table does. Participants is essentially a "team members" table so I'd be storing duplicate information if I put competition as a column in this table.

erquhart•17mo ago

hmm so you need to: 1. get the user's teams 2. get the team for the specific competition

Varun SinghOP•17mo ago

The second one: Get the team associated with a user and a competition I ended up implementing it by using the by_user index on participants, because I figure the number of hackathons a single user enters is much less than the number of competitions there will be or the number of teams across competitions Then I loop through this checking the team for that row, and eventually finding the competition

erquhart•17mo ago

const participants = await ctx.db
  .query('participants')
  .withIndex('by_user', q => q.eq('user', <user_id>))
  .collect()
const teams = await Promise.all(participants.map(async participant => {
  return ctx.db.get(participant.team)
}))
const team = teams.find(t => t.competition === <competition_id>)

const participants = await ctx.db
  .query('participants')
  .withIndex('by_user', q => q.eq('user', <user_id>))
  .collect()
const teams = await Promise.all(participants.map(async participant => {
  return ctx.db.get(participant.team)
}))
const team = teams.find(t => t.competition === <competition_id>)

That's how to do it concurrently I have found with convex, like other nosql databases, you'll likely end up duplicating some id fields across tables to make this sort of join more straightforward. I have a table that, through various ways, is related to most other tables. I ended up adding a field for that table's id to almost every table in convex a while back. Have not regretted it.

Varun SinghOP•17mo ago

I've only avoided that approach because I want to maintain a single source of truth and I like following database normalization religiously, for better or for worse

jamwt•17mo ago

I might be missing the details here, but convex is an ACID RDBMs under the covers -- the same principles about denormalization and normalization apply to convex just as with e.g. postgres or mysql while convex is "nosql" it's not "no-object-relational", which is really the part about relationships, joins, normalization, indexing, etc that's relevant--not the query language

erquhart•17mo ago

Yeah that's fair, it's not that relationships aren't supported in any way. But I never in postgres had a need to put a table's id into multiple related tables to expedite joins, and I'm fairly certain it was suggested by someone on the team. The distinction here being that with nosql it's a requirement for some relationships, whereas with convex it's more of a performance/expedience thing.

jamwt•17mo ago

How did you do joins without putting ids into each others’ tables in Postgres?

erquhart•17mo ago

I mean adding superfluous related ids directly to tables to simplify joins. Like adding a competition id to the participants table in the example above to avoid some of the complexity in the join logic. I have relations like this that are 4 times removed, so I let normalization slide a bit for expedience.

Database Design: Intersection of three fields

Did you find this page helpful?