smaccoun
smaccoun•2mo ago

convex schema + data migration

I think the issue if I understand it (i was having the same for a while) is that typically you have 3 or 4 environments (dev, canary, preview, prod,.....whatever) and you want them to be in sync. So typically at least what i'm used to with like postgres is you have CI automate the deploy between these envs so they are always in lock step. But convex doesnt have a standard database migration tool that effectively runs a set of migrations that you keep adding too that take you up from the default db state to the current state. I wrote my own my system that basically mimick dbmate (https://github.com/amacneil/dbmate) with my own migration tracking table that i am happy to share more about (it mostly works but because convex is so nicely typed does have some issues i'm working out) I really think convex should standardize on this and have maybe even have a componetn for (i'm aware of the current migrations copnent but this is not that). This is a pretty critical feature that any relational db. Its very necessary for production workflows migrationStatus: defineTable({ migrationId: v.string(), status: v.union( v.literal("pending"), v.literal("running"), v.literal("completed"), v.literal("failed"), v.literal("rolled_back") // Add rollback status ), startedAt: v.optional(v.number()), completedAt: v.optional(v.number()), error: v.optional(v.string()), // New optional fields for better tracking name: v.optional(v.string()), // Migration name description: v.optional(v.string()), // What it does recordsProcessed: v.optional(v.number()), recordsUpdated: v.optional(v.number()), recordsErrored: v.optional(v.number()), executionTimeMs: v.optional(v.number()), environment: v.optional(v.string()), // Track which env ran it }).index("migrationId", ["migrationId"]),
GitHub
GitHub - amacneil/dbmate: 🚀 A lightweight, framework-agnostic da...
🚀 A lightweight, framework-agnostic database migration tool. - amacneil/dbmate
9 Replies
Jonas
Jonas•2mo ago
On my part the issue is a bit simpler, but grounded in the same issue of lacking:
a standard database migration tool that effectively runs a set of migrations that you keep adding too that take you up from the default db state to the current state
Let's say on production, canary, and locally I have this deployed:
export const Users = Table('users', {
firstname: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
});
export const Users = Table('users', {
firstname: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
});
and I want to add a mandatory (non-optional) "lastname" to link to the user's settings. because none of my documents have this lastname yet I have to make it optional, this makes sense, so locally I now have:
export const Users = Table('users', {
name: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
configurationId: v.optional(v.id('configurations')),
});
export const Users = Table('users', {
name: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
configurationId: v.optional(v.id('configurations')),
});
I create a migration:
export const assignNames = migrations.define({
table: 'users',
migrateOne: async (ctx, doc) => {
if (!doc.lastname) {
await ctx.db.patch(doc._id, {
lastname: '',
});
}
},
});
export const assignNames = migrations.define({
table: 'users',
migrateOne: async (ctx, doc) => {
if (!doc.lastname) {
await ctx.db.patch(doc._id, {
lastname: '',
});
}
},
});
Now I get stuck, I can't make the row non-optional until I have deployed and run the migration on both canary and production too. So throughout my code the lastname will continue being potentially null. When I was using postgres this wasn't a problem since at the time I used alembic which ran migrations of schema in lockstep with data update. and this migration kept living in the migrations folder. Having schema updates as part of migrations is of course not at all what convex is about though, which is why I was bringing this up asking if anyone had any clever solutions. @jamwt cc - created thread not to clog main chat
smaccoun
smaccounOP•2mo ago
Thanks @Jonas yea I think the really simple way of expressing the issue is that convex doesnt have a versioned database migration tool, unlike mysql or postgres or whatever that have many (flywheel, alembic, dbmate, etc) I think this is really critical convex have this for production systems. Maybe the advantage convex could have over others though is having a single supported tool by the convex maintainers themselves that everyone could point to. (there's a lot of choice fatigue in the postgres world imo for a bunch of tools that mostly do the same thing) Personally i think components are the solution here along with functionality built into the cli (convex run migrations). The componetn would basically create a central table like the one I showed
migrationStatus: defineTable({
migrationId: v.string(), // friendly readable id?
status: v.union(
v.literal("pending"),
v.literal("running"),
v.literal("completed"),
v.literal("failed"),
v.literal("rolled_back") // Add rollback status
),
startedAt: v.optional(v.number()),
completedAt: v.optional(v.number()),
error: v.optional(v.string()),
name: v.optional(v.string()), // Migration name
description: v.optional(v.string()), // What it does
recordsProcessed: v.optional(v.number()),
recordsUpdated: v.optional(v.number()),
recordsErrored: v.optional(v.number()),
executionTimeMs: v.optional(v.number()),
environment: v.optional(v.string()), // Track which env ran it
}).index("migrationId", ["migrationId"])
migrationStatus: defineTable({
migrationId: v.string(), // friendly readable id?
status: v.union(
v.literal("pending"),
v.literal("running"),
v.literal("completed"),
v.literal("failed"),
v.literal("rolled_back") // Add rollback status
),
startedAt: v.optional(v.number()),
completedAt: v.optional(v.number()),
error: v.optional(v.string()),
name: v.optional(v.string()), // Migration name
description: v.optional(v.string()), // What it does
recordsProcessed: v.optional(v.number()),
recordsUpdated: v.optional(v.number()),
recordsErrored: v.optional(v.number()),
executionTimeMs: v.optional(v.number()),
environment: v.optional(v.string()), // Track which env ran it
}).index("migrationId", ["migrationId"])
One problem i will say that's a challenge with convex and standard versioned migrations is that its type safety - which is otherwise such an awesome selling point - does actually make this kind of thing harder. But i'm sure its solvable I just haven't figured it out yet Okay think aboutt his a bit more. I realize now the nature of how tricky this problem is. Basically what we are asking for is something that can run
Default state -> Series of transforms -> Ideal state
Default state -> Series of transforms -> Ideal state
This allows for easily automating migration scripts. But here's the problem. Unlike postgres, where the schema is defined mutably/imperatively as as a series of transformations, convex schemas are defined statically and declaratively!! In other words, there is no DROP COLUMN in convex because that's not how it works I think we will have to have a different mental model for how to do this in convex. Probably gonna have to involve deleting previous migration files. This is tricky though....glad we are having the discussion this has been bugging me more than anythign since i've started using convex and I'd love to get to a consensus approach!
Jonas
Jonas•2mo ago
When I was using postgres this wasn't a problem since at the time I used alembic which ran migrations of schema in lockstep with data update. and this migration kept living in the migrations folder. Having schema updates as part of migrations is of course not at all what convex is about though, which is why I was bringing this up asking if anyone had any clever solutions.
What I was hinting at here yeah. It's a hard nut to crack.
Michal Srb
Michal Srb•2mo ago
Passing by: Having field defaults helps the specific example @Jonas gave. One option is Ents: https://labs.convex.dev/convex-ents/schema#field-defaults For the rest of the discussion, I think you're on the right track, in that Convex migrations are not like schema migrations in Postgres, but more like the data migrations done at bigger companies. These migrations can take a long real-world time (say weeks), during which bugs are ironed out, and the data is migrated. Having the schema be backwards compatible is the only way to allow concurrent development with the migrations in progress.
Jonas
Jonas•2mo ago
Thanks, will check them out!
smaccoun
smaccounOP•2mo ago
Thanks @Michal Srb that ents looks really interseting I guess this is where I've mostly landed, that you really just have to be eternally backward compatible with all your migrations if you want to have some kind of automatic migration script you can run between environments I understand this is might be the norm to a large extent at big companies. But there is also often a time where you have a cutoff and the cost of supporting backeward compat with all the additional code branch logic to handle can get to be a bit too much. I'm not sure I like the idea of never being able to do a drop column from automated scripts Even if it is decided that the only solution is to only ever do backward compatible migrations, I still think it would be worth it to standardize a component for a migration version tracking table. Or at least, it would be nice if someone could provide a script they use to run in CI for how to handle automatic migrations between multiple envirionments that works consistently over time
jamwt
jamwt•2mo ago
yep, convex migrations are modeled how big systems do them, where there is no "instant" you can switch over, and so there is a time where you have "both kinds" of data running for some period (minutes, hours, days, etc) this does make them three steps instead of one this is "worth it" in production, but perhaps not in dev/ci -- which is why, normally, in these systems, we recommend you use seed data etc rather than prod data. also b/c eventually you can't use prod data b/c of PII/regulation/security etc in dev/CI, we recommend the way to migrate is: blow away your data 😄 if you want the codebase to always treat to the value as the new type, you can do that with a "lazy migration" ORM type thing we should ship an example of that to go along with the migration component if your migration is mostly adding a field, @Michal Srb is right that ents already gives you something useful via it's default
Jonas
Jonas•2mo ago
Thanks for the replies, really And yeah ents seem to be the way in terms of minimizing a lot of this manual behavior. also takes care of things like cascade deletion.. which would help me with how I regularly forget to implement it myself in delete endpoints and make up some funky migration to delete all the garbage 😅 😂 I want to ask though since I'm not just playing around but using convex seriously for long-term projects. What's the plan with ents? to me from the reading I did on it, it sounds like the way you should be building with convex from the start, but with it not being pushed hard to the point I haven't even heard of it it ticks some flags. Is it "the proper/future way" of doing convex (not that I care what proper is, mostly the future part, more that I need something that will be built on long term and become the standard way, switching ways of doing things is expensive and I have to justify these changes to my clients). Why isn't it being pushed as a replacement of the "old way" <-- excuse my ignorance haha
https://discord.com/channels/1019350475847499849/1280908737032360017/1333501733950984283
ah found this. if that's still the case I'll probably hold off
smaccoun
smaccounOP•2mo ago
Also thanks for ansswering @Jamie and @Michal Srb I guess i'm okay with this overall. In short 1. Try to always keep backward compatibility 2. While i'm used to doing (1), there are times usually after old backward compat code has gotten really old and messy when you have to make a decision to deprectae stuff because the code becomes just too cumbersome. In those cases I guess we should just not rely on automated scripts for our envs and just carefully run them ourselves (drop column is just an example but there are many more) 3. I still think we (and convex) team should standardize on at least automatated data migration patterns. That migrations table I suggested above I think is in the ballpark. But its a very standard pattern to deploy via CI your migrations so that when you deploy to staging or canary or whatever before prod you know that you are getting consistent changes

Did you find this page helpful?