aarku
aarku2mo ago

Plans to support functions returning delta updates?

I have a question about the use case where you are subscribing to a query that returns a large array of data that changes often. Are there any plans to officially support smartly returning only the changes, rather than the whole array every time? Similar to how Firestore and Realm ( called MongoDB Atlas Device SDKs now) behaves. Thanks!
36 Replies
Convex Bot
Convex Bot2mo ago
Thanks for posting in <#1088161997662724167>. Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets. - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.) - Use search.convex.dev to search Docs, Stack, and Discord all at once. - Additionally, you can post your questions in the Convex Community's <#1228095053885476985> channel to receive a response from AI. - Avoid tagging staff unless specifically instructed. Thank you!
jamwt
jamwt2mo ago
hi! the plan is to build this as part of the local sync engine project
jamwt
jamwt2mo ago
An Object Sync Engine for Local-first Apps
Object sync engines manage a rich object graph across multiple clients and a centralized server and are a great fit for building local-first apps.
eli
eli2mo ago
any ETA on this?
Alm
Alm2mo ago
I’m also curious on eta. I wish we could just get delta syncs with subscriptions first. It would be sooooo useful
jamwt
jamwt2mo ago
we have to knock out some chef stuff, and hire up the team to take on all these new projects. I'm going to estimate we'll get back to it mid summer
Alm
Alm2mo ago
Imo we don’t need a full local first implementation, just subscriptions that do light caching for the UI to load instantly when re-viewing a page (without having to re-pull from scratch), and also delta updates in the subscription. This is all we need and it would be perfectly useful. And to be clear, I’m not talking about the helper for cached subscriptions. Though that has some good use cases
jamwt
jamwt2mo ago
npm
convex-helpers
A collection of useful code to complement the official convex package.. Latest version: 0.1.79, last published: 10 days ago. Start using convex-helpers in your project by running npm i convex-helpers. There are 12 other projects in the npm registry using convex-helpers.
jamwt
jamwt2mo ago
oh 😦 yeah, deltas for normal subscriptions aren't likely to show up soon -- the local sync engine which has a robust model for deltas would be the first version of that
Alm
Alm2mo ago
Cached subscriptions + delta updates would be enough to get by. Right now we have been building our own implementation and it’s not fun/ideal. the function subscribes by updated time. Then we have to re-subscribe on every subscription response. Then manage a soft delete table for any deletions. If we could get a delta subscription without local first that would be really really nice
Kristoff95
Kristoff952mo ago
Hi, is this an official package from convex?
jamwt
jamwt2mo ago
Yeah. @Ian on our team maintains it.
Kristoff95
Kristoff952mo ago
Wow!! This package makes convex even more amazing 🤯
Max
Max4w ago
Hi, before sending only the diff becomes part of convex, how would you recommend handling use cases like a group chat? Should I split every 100 messages into a new table? That seems like it would work but wouldn't be a very elegant solution.
WeamonZ
WeamonZ4w ago
@Max 👾 I even think you should make a custom chatMessages table with each row linked to a chat... As I mentioned here https://github.com/get-convex/convex-backend/issues/95 It looks like the cache system invalidates queries even when the result didn't change....
GitHub
Concerns About Bandwidth Usage, Caching, and Scalability Limits in ...
Hi Convex team, I&#39;ve been migrating my app to Convex for about a month now. While it&#39;s still not in production, I&#39;ve already hit the 1 GB database bandwidth threshold, which is quite co...
Max
Max4w ago
Hey, thanks for the suggestion. I’m still relatively new to using convex and i’m curious to how that would solve the issue. Wouldn’t that make it so if any chat changes the entire table gets invalidated, and therefore every user has to refresh all chats and not just the opened conversation? I think the part i’m not understanding is what should I be querying?
WeamonZ
WeamonZ4w ago
@Max 👾 I asked GPT to provide a quick tutorial for you The key thing to understand is that Convex tracks the query result — not the whole table. If you filter by chatId, Convex only triggers updates when that specific filtered result changes. Let’s walk through a real example: --- 📦 schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
chats: defineTable({
name: v.string(),
}),

chatUsers: defineTable({
chatId: v.id("chats"),
userId: v.id("users"),
}),

chatMessages: defineTable({
chatId: v.id("chats"),
userId: v.id("users"),
content: v.string(),
}),
});
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
chats: defineTable({
name: v.string(),
}),

chatUsers: defineTable({
chatId: v.id("chats"),
userId: v.id("users"),
}),

chatMessages: defineTable({
chatId: v.id("chats"),
userId: v.id("users"),
content: v.string(),
}),
});
--- 📥 getMessages.ts – Query with logic
import { query } from "convex/server";
import { v } from "convex/values";

export const getMessages = query({
args: { chatId: v.id("chats") },
handler: async (ctx, args) => {
// Step 1: Get all messages for the current chat
const messages = await ctx.db
.query("chatMessages")
.filter((q) => q.eq(q.field("chatId"), args.chatId))
.order("asc") // sort by insertion order (or add timestamps later)
.collect();

// Step 2: Load additional user data if needed
const userIds = [...new Set(messages.map((msg) => msg.userId))];
const users = await Promise.all(
userIds.map((userId) => ctx.db.get(userId))
);

const userMap = Object.fromEntries(
users
.filter(Boolean)
.map((user) => [user!._id, user!.name ?? "Unknown"])
);

// Step 3: Transform messages before returning
return messages.map((msg) => ({
id: msg._id,
content: msg.content,
authorName: userMap[msg.userId] || "Unknown",
}));
},
});
import { query } from "convex/server";
import { v } from "convex/values";

export const getMessages = query({
args: { chatId: v.id("chats") },
handler: async (ctx, args) => {
// Step 1: Get all messages for the current chat
const messages = await ctx.db
.query("chatMessages")
.filter((q) => q.eq(q.field("chatId"), args.chatId))
.order("asc") // sort by insertion order (or add timestamps later)
.collect();

// Step 2: Load additional user data if needed
const userIds = [...new Set(messages.map((msg) => msg.userId))];
const users = await Promise.all(
userIds.map((userId) => ctx.db.get(userId))
);

const userMap = Object.fromEntries(
users
.filter(Boolean)
.map((user) => [user!._id, user!.name ?? "Unknown"])
);

// Step 3: Transform messages before returning
return messages.map((msg) => ({
id: msg._id,
content: msg.content,
authorName: userMap[msg.userId] || "Unknown",
}));
},
});
✅ This is still a reactive query. It’ll only re-run when messages for that chatId change, or when one of those user documents change. --- ✏️ sendMessage.ts – Add a message
import { mutation } from "convex/server";
import { v } from "convex/values";

export const sendMessage = mutation({
args: {
chatId: v.id("chats"),
userId: v.id("users"),
content: v.string(),
},
handler: async (ctx, args) => {
await ctx.db.insert("chatMessages", {
chatId: args.chatId,
userId: args.userId,
content: args.content,
});
},
});
import { mutation } from "convex/server";
import { v } from "convex/values";

export const sendMessage = mutation({
args: {
chatId: v.id("chats"),
userId: v.id("users"),
content: v.string(),
},
handler: async (ctx, args) => {
await ctx.db.insert("chatMessages", {
chatId: args.chatId,
userId: args.userId,
content: args.content,
});
},
});
--- 🧠 Summary So to your original question:
Wouldn’t that make it so if any chat changes, the entire table gets invalidated?
Nope! If you filter by chatId, only queries that care about that chatId will get updated. If someone else posts in a different chat, your client won’t react or re-fetch anything. getMessages will retrieve ALL the messages each time the db query result changes so : - Whenever a message is deleted, - Whenever a message is added - Whenever a message is updated (the content for example) If your chat have THOUSNADS of messages this is not optimal. You should use a paginated query. Also, using filter with convex is not optimal either, use the chatId as an index and query the messages via the index
chatMessages: defineTable({
chatId: v.id("chats"),
userId: v.id("users"),
content: v.string(),
}).index("by_chat_id", ["chatId"]), // ✅ indexed for efficient querying
chatMessages: defineTable({
chatId: v.id("chats"),
userId: v.id("users"),
content: v.string(),
}).index("by_chat_id", ["chatId"]), // ✅ indexed for efficient querying
import { query, paginationOptsValidator } from "convex/server";
import { v } from "convex/values";

export const getPaginatedMessages = query({
args: {
chatId: v.id("chats"),
paginationOpts: paginationOptsValidator,
},
handler: async (ctx, { chatId, paginationOpts }) => {
const messagesQuery = ctx.db
.query("chatMessages")
.withIndex("by_chat_id", (q) => q.eq("chatId", chatId))
.order("desc");

const page = await messagesQuery.paginate(paginationOpts);

return page;
},
});
import { query, paginationOptsValidator } from "convex/server";
import { v } from "convex/values";

export const getPaginatedMessages = query({
args: {
chatId: v.id("chats"),
paginationOpts: paginationOptsValidator,
},
handler: async (ctx, { chatId, paginationOpts }) => {
const messagesQuery = ctx.db
.query("chatMessages")
.withIndex("by_chat_id", (q) => q.eq("chatId", chatId))
.order("desc");

const page = await messagesQuery.paginate(paginationOpts);

return page;
},
});
Something like that
ian
ian3w ago
One edit: it should do a withIndex instead of .filter((q) => q.eq(q.field("chatId"), args.chatId)) in the first example. A query using withIndex will only "track" the messages returned from the index lookup. A .filter will scan the whole table and compare the field chatId. It's a common mistake, so wanted to call it out
Figloalds
Figloalds3w ago
The way that I achieved chat without constantly re-sending the entire conversation for each new message was to add a cursor parameter, every time the query updates I move the entire chat to a non-reactive list of messages and advance the cursor; In this way each message will trigger an update with the new message, a query unsubscription a new query subscription, an update from convex with an empty array; which is less bad than re-sending the whole chat log to all users every time; It loses the ability to edit old messages tho, so there's that too
Figloalds
Figloalds3w ago
No description
Alm
Alm3w ago
Can anyone from Convex explain the reasoning for not including a delta subscription in the basic query/subscription? By Delta Subscription, I mean a subscription that only sends down new updates (after the subscription was established), instead of sending down ALL results of the query every time a single updated is made. I'm at a complete loss trying to understand the reasoning here, since it seems like one of the most obvious features for a socket based subscription system like convex. I must be missing something fundamental about how the Convex team intended this to work and why the features haven't been added yet. And no the current query/subscription or cached query helper does not enable delta subscriptions. I pay for a team of 5 full-time devs on a convex account and we have to do a lot of work to get around this limitation, it's a big pain. We don't need offline data, we just need delta subscription updates and this should be the standard behavior in a convex query/subscription. Instead of more Chef features can we get this delta updates feature?
jamwt
jamwt3w ago
For update speed?
Figloalds
Figloalds3w ago
Convex doesn't include an updatedTime by default in it's data model; And delta updates also depend on how granular the delta updates would be Is it ok for Convex to resend an entire document from a list of documents when only one of it's children has changed? (for me its ok), if so, then having an updatedTime and a cursor on the backend are enough to detect what changed in the query that is currently on the client without having to store the entire result in the backend, only ids + updatedTime; That is a compromise. Another compromise is if the granularity of that delta is more minute, for example if description changed in a record, preferably only { description: <value> } or so goes to the client. But that requires the backend to store the entire query result for each subscribed query and additional processing power to compare and generate those deltas Ultimately, it would be ideal if we could choose what type of sync strategy a given query utilizes, but I understand the technical challenges of adding this feature (even tho I also think it's critical for Convex to have, since it's the absolute missing piece stopping it from being astoundingly awesome)
jamwt
jamwt3w ago
If you pull this thread, it's hugely complex. Especially accurately tracking the "delta basis" for any given client. The only way to provide that is as part of a more complete local sync vision that is layered into the existing system. The basic protocol we have today wont change, and unfortunately this is not a basic feature. So we will provide this eventually, but it will almost definitely be part of a higher level local sync protocol project. There's nothing wrong with the query function primitive we have today.
jamwt
jamwt3w ago
This article is still the best articulation we have on the architectural complexity here. https://stack.convex.dev/object-sync-engine
An Object Sync Engine for Local-first Apps
Object sync engines manage a rich object graph across multiple clients and a centralized server and are a great fit for building local-first apps.
WeamonZ
WeamonZ3w ago
I think having something like this would significantly improve the number of updates performed on any large array of data. 1. getFields() / retrieve() — Include only specific fields (reactively observed)
const project = await ctx.db.query("projects")
.withIndex("by_name", q => q.eq("name", args.name))
.retrieve(["progress"]) // Only track and return 'progress'
.first();
const project = await ctx.db.query("projects")
.withIndex("by_name", q => q.eq("name", args.name))
.retrieve(["progress"]) // Only track and return 'progress'
.first();
2. omitFields() / omit() — Exclude specific fields from tracking and retrieval
const projects = await ctx.db.query("projects")
.omit(["progress", "data.*"]) // Everything except these fields
.take(100);
const projects = await ctx.db.query("projects")
.omit(["progress", "data.*"]) // Everything except these fields
.take(100);
https://github.com/get-convex/convex-backend/issues/97
GitHub
🧩 Feature Proposal: Selective Field Retrieval and Omission · Is...
Hi Convex team 👋 First off—huge fan of what you’re building. Convex makes real-time backend logic incredibly ergonomic, and I’ve been using it extensively in my project. ⚠️ The Problem Right now, C...
ian
ian3w ago
By the way, this query is using filter for _creationTime so it'll read every message in the channel each time, even if it only returns the latest one. withIndex("byChannel", q => q.eq("channelId", args.channelId).gte("_creationTime", args.cursor || 0)) is probably what you want (_creationTime is appended to every index automatically) Also: beware that races with mutations can insert documents with slightly older _creationTime so you could miss one with this approach. One way to do this would be to add an incrementing "messageOrder" field to the messages document that you increment on insert (assuming not a ton of messages being added to the same channel at once, e.g. not >5 per second). Then your index could be on [channelId, messageOrder] and you could fetch the next that way. Transactions would guarantee no message would have a duplicate order or be inserted out of order. But generally using the pagination helper with a small page size is roughly equivalent and does all of this for you automatically.
djbalin
djbalin3w ago
I'm curious Ashok, what problem are you experiencing from these non-partial updates at the moment? I've been thinking about this a bit as well: we are a video streaming platform, and I've been a bit worried that our app may be overreacting when, for example, one of our creators uploads a new video or simply makes a minor edit to a video (and thus modifies the video table that our clients subscribe to). What @Ian mentioned above is interesting: that queries with an index are only reactive to the data within that index range. It would be very useful to have a clear outline of which changes to data trigger a refetch and under what conditions!
Figloalds
Figloalds3w ago
I too want to see a document or a blog post talking about that, it feels to me like "thing being mutated" is checked against all active subscribed query conditions and if it matches or if its id is contained in the "current state", it triggers refetch for that query
jamwt
jamwt3w ago
doesn't "how convex works' cover this? it's predicated on index ranges just to be crystal clear, b/c it's true everyone is owed a full description of this... not approximately, precisely whatever index ranges you use are the cache key in terms of dependent database records
djbalin
djbalin2w ago
Stack posts are great reads and very valuable, and thanks a lot for making them! but I was thinking of something more along the lines of typical documentation or reference manuals. "How Convex works" is a great, but it's quite a long read for someone who just wants to find out e.g. exactly when a cache is busted in Convex. That information could e.g. be written succinctly at https://docs.convex.dev/realtime#automatic-caching. I can tell that you put great effort into keeping the docs lean, simple, actionable, and eradable, and that's great, keep doing that imo, but would also be awesome with links like "More about caching" that point to a complete specification of the given topics! ☺️
Realtime | Convex Developer Hub
Building realtime apps with Convex
Alm
Alm2w ago
Right now we are able to work around the issue by building our own delta logic, and it’s not super complex and works okay. Just seems like the convex team could release a helper for any part of this flow to make it far easier to implement. I think most everyone could use a delta subscription in a production project rather than getting the entire list of data on a small update. My quick/rough summary of how we implement delta subscriptions: - We fetch the data and get back the full list - Then store the result in a local state and run the UI off of the local state data - Then use the last updatedTime to query/subscribe to a function that queries with the last updatedTime and only responds with filtered new updates - We loop through the response and update the local state with the updates - Then every time we receive a update we close and resubscribe the query subscription with the latest updatesTime (a helper to avoid this would be amazing) - Then we have to add a isDeleted bool to the data so if something is deleted it’s still reactive with the query subscription and we use a cron job to clean them up later I might have missed something but this is the rough idea
Figloalds
Figloalds2w ago
I have done that same thing too and it's fairly simple to implement; I have wondered if it would be possible to create a Convex component with that logic, like "progressiveQuery", also Ian just above warned me that the timestamp cursor for progressive update may fail because in high concurrency environments, items may be inserted with an older timestamp
ian
ian2w ago
yeah I'm now implementing my second delta sync implementation in the Agent component (first one was the collaborative text editor component). I'd love to expose a DeltaSync or something component that just wraps this up. I use an incrementing integer as the key (assuming not a ton of writes per second per stream / namespace)
Alm
Alm2w ago
I thought convex had some kind of time series component to it? Where the order or time is trustworthy
ian
ian2w ago
Creation time is monotonic within a mutation, but if two mutations start around the same time, A then B, say, then if B does an insert and finishes first, it'd be dependent on A finishing without an insert, mean holding B or re-running it, which ends up serializing requests and breaks down as things get distributed and higher scale. This is due to being able to read the _creationTime within the mutation - if it was assigned after the transaction committed it'd be easier. The missing piece here is surfacing the transaction ID which is a pure cursor even when things are inserted out of order based on the commit time vs. creation time (which is pinned at the start of the transaction). But a couple years ago it was the case that it was all serialized, before some scaling efforts. So if you've been a user that long, you're right!

Did you find this page helpful?