Database - Best way to model tags associated with an entity
I am storing a bunch of articles, each of which I want to store some tags against. I then want to be able to query for articles associated with one or more tags.
I've seen the join example under https://docs.convex.dev/database/reading-data#more-complex-queries but it doesn't seem like it will work well here.
For example if I have two tables:
- article
- article_tag
If I want to find articles that have tagA and tagB it seems like I would want to do something like:
fetch article_tag entries matching tagA
for each of those, X, fetch article_tag matching X.articleId and tagB
fetch all those results, Y, fetch articles matching Y.articleId
sort and return the first page
Hopefully I'm missing something 🙂
Reading Data | Convex Developer Hub
Query and
10 Replies
Hey @yarrichar, with Convex you have full control over how the query is executed, based on the needs of your app.
First of all, do you need tags to be documents (for example to allow renaming them), or can they be just strings?
Second, probably the simplest data model is to store an array of tags on each article document. This will mean you'll have to scan all articles, and filter them in JS based on matching tags. You could do this post pagination. Whether this is OK perf wise depends on how many articles you plan to store.
You could also do the opposite, and store a list of article IDs for each tag. You could then query all tags, and perform intersection on the results in JS. Here you might run into perf-issues if you have a lot of users writing to the same tag frequently.
If these are not efficient enough, you could use an "edge" document to represent the relationship between a tag and an article. With the right indexes in place this would enable efficient fetching of all articles for given set of tags, and again doing the intersection in JS. Checkout these stack articles for more info:
https://stack.convex.dev/relationship-structures-let-s-talk-about-schemas
https://stack.convex.dev/functional-relationships-helpers
Relationship Structures: Let's Talk About Schemas
In this post we’ll look at some patterns for structuring relationships in the Convex database.
Functional Relationships: Helpers
In this post, we’ll look at some helper functions to help write code to traverse relationships in a readable, predictable, and debuggable way.
Depending on your requirements, it might even be possible to out the tags in a large string and put a search index on that field: https://docs.convex.dev/text-search
Full Text Search | Convex Developer Hub
Run search queries over your Convex documents
Your original approach of doing an indexed look up for each tag and an intersection in js sounds the most scalable if you expect to have > 10k articles.
The above approaches are more convenient if you will only have ~hundreds.
as an example, I use Convex for data related to our templates page and stack. Neither of those have thousands of entries for now, so I just grab them all and do the logic in JS.
stack.convex.dev
convex.dev/templates
Thankfully the code has super fast database access, so doing multiple calls can happen in sequence while still being fast overall, in my experience. Point lookups for me have been ~2ms or less usually, and can be parallelized
This is a new project, so there is a fair bit of flexibility in terms of the actual approach, and how the data is structured - they definitely don't need to be as separate documents. One requirement is that if it does do well then underlying database / schema can handle it. I would be hoping for more than 10k articles though 🙂
Given the simple nature of tags it sounds like storing them as a simple text column on the article document, separated by spaces.
Thanks for all the feedback guys!
One note- our text search is an OR of search terms. To get an AND you’d have to fetch in queries or search twice and intersect.
Ah, and I assume that that's a limitation of the underlying DB / won't be an easy fix?
That is a reflection of the search index being optimized for full text search: it will return the results in ranked order, based on how similar they are, and won't exclude results that only have one of the terms.
Ahh, cool. Do you know if that means results that match multiple terms will be sorted higher?
Do you guys have any database / query performance stats you can share for this type of query (or others) when a decent number of records are involved? Are you able to share what the underlying DB is?
the underlying storage engine is mysql, but the "database" is more or less a custom thing. so sometimes your results come out of caches and so on. the performance isn't 1:1 with any particular well known relational database. but here's some general guidance: indexed point queries take 1-2ms, an entire "query function" run is normally around 10ms if it only uses a small number of records (v8 overhead); if the query function is cached, it will get pulled in < 1ms out of a fast Rust-based in memory store
Cool, thanks for the info