mavix
mavix4mo ago

Searching posts that contains "#" symbol does not work

I'm trying to make a query that returns a list of hasthags based on the current posts, so I could get the trending topics. I have the following query:
export const getHashtagWordsWithFilterAndCount = query({
args: {
limit: v.optional(v.number()),
},
handler: async (ctx, args) => {
// Get all posts that contain hashtags
const posts = await ctx.db
.query("posts")
.withSearchIndex("search_post", (q) => q.search("content", "#"))
.collect();
// ^ This does not return anything

// Create a map to count hashtag occurrences
const hashtagCountMap = new Map<string, number>();

// Process each post to extract hashtags
posts.forEach((post) => {
// Use regex to find all hashtags in the content
const hashtagRegex = /#[\w\u0590-\u05ff]+/g;
const hashtags = post.content.match(hashtagRegex) || [];

// Count each hashtag
hashtags.forEach((hashtag) => {
const normalizedHashtag = hashtag.toLowerCase();
hashtagCountMap.set(
normalizedHashtag,
(hashtagCountMap.get(normalizedHashtag) || 0) + 1,
);
});
});

// Convert map to array and sort by count
const hashtagCountArray: HashtagCount[] = Array.from(
hashtagCountMap,
([hashtag, count]) => ({ hashtag, count }),
).sort((a, b) => b.count - a.count);

// Apply limit if provided
return args.limit
? hashtagCountArray.slice(0, args.limit)
: hashtagCountArray;
},
});
export const getHashtagWordsWithFilterAndCount = query({
args: {
limit: v.optional(v.number()),
},
handler: async (ctx, args) => {
// Get all posts that contain hashtags
const posts = await ctx.db
.query("posts")
.withSearchIndex("search_post", (q) => q.search("content", "#"))
.collect();
// ^ This does not return anything

// Create a map to count hashtag occurrences
const hashtagCountMap = new Map<string, number>();

// Process each post to extract hashtags
posts.forEach((post) => {
// Use regex to find all hashtags in the content
const hashtagRegex = /#[\w\u0590-\u05ff]+/g;
const hashtags = post.content.match(hashtagRegex) || [];

// Count each hashtag
hashtags.forEach((hashtag) => {
const normalizedHashtag = hashtag.toLowerCase();
hashtagCountMap.set(
normalizedHashtag,
(hashtagCountMap.get(normalizedHashtag) || 0) + 1,
);
});
});

// Convert map to array and sort by count
const hashtagCountArray: HashtagCount[] = Array.from(
hashtagCountMap,
([hashtag, count]) => ({ hashtag, count }),
).sort((a, b) => b.count - a.count);

// Apply limit if provided
return args.limit
? hashtagCountArray.slice(0, args.limit)
: hashtagCountArray;
},
});
But is not returning anything, even though there are posts that contains "#" in their content, as shown in the attached image. I don't know if has something to do with the character itself, because if I change the search term, for let's say, the character 'b', it does work.
No description
7 Replies
mavix
mavixOP4mo ago
This is the schema for the posts document:
posts: defineTable({
content: v.string(),
authorId: v.id("users"),
attachments: v.optional(v.array(v.id("media"))),
likes: v.optional(v.id("likes")),
bookmarks: v.optional(v.id("bookmarks")),
comments: v.optional(v.id("comments")),
likedNotifications: v.optional(v.id("notifications")),
})
.searchIndex("search_post", {
searchField: "content",
})
.index("by_authorId", ["authorId"]),
posts: defineTable({
content: v.string(),
authorId: v.id("users"),
attachments: v.optional(v.array(v.id("media"))),
likes: v.optional(v.id("likes")),
bookmarks: v.optional(v.id("bookmarks")),
comments: v.optional(v.id("comments")),
likedNotifications: v.optional(v.id("notifications")),
})
.searchIndex("search_post", {
searchField: "content",
})
.index("by_authorId", ["authorId"]),
erquhart
erquhart4mo ago
Can you run the search query .collect() using "f" as the query and log immediately after the collect? Should at least return "my first post with images". That'll confirm whether this is a hash symbol issue in Convex search or something else.
mavix
mavixOP4mo ago
12/5/2025, 11:46:24 a. m. [CONVEX Q(trendingTopics:getHashtagWordsWithFilterAndCount)] [LOG] 'Convex posts' [
{
_creationTime: 1746482158720.9988,
_id: 'k178jgvhrppmp1vcn6v09whcjn7fbb05',
attachments: [ 'k5706sjw9n1n3bgcqatqkdrwkx7fbs7c' ],
authorId: 'j979tfq9hwa83mqeg3cx6904m17f7f86',
content: 'my first post with images'
}
]
12/5/2025, 11:46:24 a. m. [CONVEX Q(trendingTopics:getHashtagWordsWithFilterAndCount)] [LOG] 'Convex posts' [
{
_creationTime: 1746482158720.9988,
_id: 'k178jgvhrppmp1vcn6v09whcjn7fbb05',
attachments: [ 'k5706sjw9n1n3bgcqatqkdrwkx7fbs7c' ],
authorId: 'j979tfq9hwa83mqeg3cx6904m17f7f86',
content: 'my first post with images'
}
]
erquhart
erquhart4mo ago
yeah sounds like this is specific to hashtag and assuming you've logged right after collect with hashtag and it's empty
mavix
mavixOP4mo ago
yes, if i do:
// Get all posts that contain hashtags
const posts = await ctx.db
.query("posts")
.withSearchIndex("search_post", (q) => q.search("content", "#"))
.collect();

console.log("Convex posts", posts);
// Get all posts that contain hashtags
const posts = await ctx.db
.query("posts")
.withSearchIndex("search_post", (q) => q.search("content", "#"))
.collect();

console.log("Convex posts", posts);
I get the log in the attached image
No description
erquhart
erquhart4mo ago
I believe the tokenizer only handles alphanumeric characters and splits on everything else as punctuation. You may want to parse and handle hash tags and store them individually, depending on what sort of behavior you're going for. Happy to help hash out a solution (pun not intended)
mavix
mavixOP4mo ago
I do need some help with this. My application is a social network where users can post content and include hashtags like "#convex" or "#base". My intention is to track the usage of these hashtags across posts and eventually display them in a "Trending Topics" section on the feed, showing the most popular hashtags. Given that the tokenizer currently splits on non-alphanumeric characters, I was wondering if you could help me come up with a solution to properly handle hashtags and how to store them separately so I can implement this feature. Looking forward to your suggestions!

Did you find this page helpful?