smaccoun
smaccoun•7d ago

convex schema + data migration

I think the issue if I understand it (i was having the same for a while) is that typically you have 3 or 4 environments (dev, canary, preview, prod,.....whatever) and you want them to be in sync. So typically at least what i'm used to with like postgres is you have CI automate the deploy between these envs so they are always in lock step. But convex doesnt have a standard database migration tool that effectively runs a set of migrations that you keep adding too that take you up from the default db state to the current state. I wrote my own my system that basically mimick dbmate (https://github.com/amacneil/dbmate) with my own migration tracking table that i am happy to share more about (it mostly works but because convex is so nicely typed does have some issues i'm working out) I really think convex should standardize on this and have maybe even have a componetn for (i'm aware of the current migrations copnent but this is not that). This is a pretty critical feature that any relational db. Its very necessary for production workflows migrationStatus: defineTable({ migrationId: v.string(), status: v.union( v.literal("pending"), v.literal("running"), v.literal("completed"), v.literal("failed"), v.literal("rolled_back") // Add rollback status ), startedAt: v.optional(v.number()), completedAt: v.optional(v.number()), error: v.optional(v.string()), // New optional fields for better tracking name: v.optional(v.string()), // Migration name description: v.optional(v.string()), // What it does recordsProcessed: v.optional(v.number()), recordsUpdated: v.optional(v.number()), recordsErrored: v.optional(v.number()), executionTimeMs: v.optional(v.number()), environment: v.optional(v.string()), // Track which env ran it }).index("migrationId", ["migrationId"]),
GitHub
GitHub - amacneil/dbmate: 🚀 A lightweight, framework-agnostic da...
🚀 A lightweight, framework-agnostic database migration tool. - amacneil/dbmate
7 Replies
Jonas
Jonas•6d ago
On my part the issue is a bit simpler, but grounded in the same issue of lacking:
a standard database migration tool that effectively runs a set of migrations that you keep adding too that take you up from the default db state to the current state
Let's say on production, canary, and locally I have this deployed:
export const Users = Table('users', {
firstname: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
});
export const Users = Table('users', {
firstname: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
});
and I want to add a mandatory (non-optional) "lastname" to link to the user's settings. because none of my documents have this lastname yet I have to make it optional, this makes sense, so locally I now have:
export const Users = Table('users', {
name: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
configurationId: v.optional(v.id('configurations')),
});
export const Users = Table('users', {
name: v.string(),
email: v.string(),
avatar: v.string(),
externalId: v.string(),
configurationId: v.optional(v.id('configurations')),
});
I create a migration:
export const assignNames = migrations.define({
table: 'users',
migrateOne: async (ctx, doc) => {
if (!doc.lastname) {
await ctx.db.patch(doc._id, {
lastname: '',
});
}
},
});
export const assignNames = migrations.define({
table: 'users',
migrateOne: async (ctx, doc) => {
if (!doc.lastname) {
await ctx.db.patch(doc._id, {
lastname: '',
});
}
},
});
Now I get stuck, I can't make the row non-optional until I have deployed and run the migration on both canary and production too. So throughout my code the lastname will continue being potentially null. When I was using postgres this wasn't a problem since at the time I used alembic which ran migrations of schema in lockstep with data update. and this migration kept living in the migrations folder. Having schema updates as part of migrations is of course not at all what convex is about though, which is why I was bringing this up asking if anyone had any clever solutions. @jamwt cc - created thread not to clog main chat
smaccoun
smaccounOP•6d ago
Thanks @Jonas yea I think the really simple way of expressing the issue is that convex doesnt have a versioned database migration tool, unlike mysql or postgres or whatever that have many (flywheel, alembic, dbmate, etc) I think this is really critical convex have this for production systems. Maybe the advantage convex could have over others though is having a single supported tool by the convex maintainers themselves that everyone could point to. (there's a lot of choice fatigue in the postgres world imo for a bunch of tools that mostly do the same thing) Personally i think components are the solution here along with functionality built into the cli (convex run migrations). The componetn would basically create a central table like the one I showed
migrationStatus: defineTable({
migrationId: v.string(), // friendly readable id?
status: v.union(
v.literal("pending"),
v.literal("running"),
v.literal("completed"),
v.literal("failed"),
v.literal("rolled_back") // Add rollback status
),
startedAt: v.optional(v.number()),
completedAt: v.optional(v.number()),
error: v.optional(v.string()),
name: v.optional(v.string()), // Migration name
description: v.optional(v.string()), // What it does
recordsProcessed: v.optional(v.number()),
recordsUpdated: v.optional(v.number()),
recordsErrored: v.optional(v.number()),
executionTimeMs: v.optional(v.number()),
environment: v.optional(v.string()), // Track which env ran it
}).index("migrationId", ["migrationId"])
migrationStatus: defineTable({
migrationId: v.string(), // friendly readable id?
status: v.union(
v.literal("pending"),
v.literal("running"),
v.literal("completed"),
v.literal("failed"),
v.literal("rolled_back") // Add rollback status
),
startedAt: v.optional(v.number()),
completedAt: v.optional(v.number()),
error: v.optional(v.string()),
name: v.optional(v.string()), // Migration name
description: v.optional(v.string()), // What it does
recordsProcessed: v.optional(v.number()),
recordsUpdated: v.optional(v.number()),
recordsErrored: v.optional(v.number()),
executionTimeMs: v.optional(v.number()),
environment: v.optional(v.string()), // Track which env ran it
}).index("migrationId", ["migrationId"])
One problem i will say that's a challenge with convex and standard versioned migrations is that its type safety - which is otherwise such an awesome selling point - does actually make this kind of thing harder. But i'm sure its solvable I just haven't figured it out yet Okay think aboutt his a bit more. I realize now the nature of how tricky this problem is. Basically what we are asking for is something that can run
Default state -> Series of transforms -> Ideal state
Default state -> Series of transforms -> Ideal state
This allows for easily automating migration scripts. But here's the problem. Unlike postgres, where the schema is defined mutably/imperatively as as a series of transformations, convex schemas are defined statically and declaratively!! In other words, there is no DROP COLUMN in convex because that's not how it works I think we will have to have a different mental model for how to do this in convex. Probably gonna have to involve deleting previous migration files. This is tricky though....glad we are having the discussion this has been bugging me more than anythign since i've started using convex and I'd love to get to a consensus approach!
Jonas
Jonas•6d ago
When I was using postgres this wasn't a problem since at the time I used alembic which ran migrations of schema in lockstep with data update. and this migration kept living in the migrations folder. Having schema updates as part of migrations is of course not at all what convex is about though, which is why I was bringing this up asking if anyone had any clever solutions.
What I was hinting at here yeah. It's a hard nut to crack.
Michal Srb
Michal Srb•6d ago
Passing by: Having field defaults helps the specific example @Jonas gave. One option is Ents: https://labs.convex.dev/convex-ents/schema#field-defaults For the rest of the discussion, I think you're on the right track, in that Convex migrations are not like schema migrations in Postgres, but more like the data migrations done at bigger companies. These migrations can take a long real-world time (say weeks), during which bugs are ironed out, and the data is migrated. Having the schema be backwards compatible is the only way to allow concurrent development with the migrations in progress.
Jonas
Jonas•6d ago
Thanks, will check them out!
smaccoun
smaccounOP•6d ago
Thanks @Michal Srb that ents looks really interseting I guess this is where I've mostly landed, that you really just have to be eternally backward compatible with all your migrations if you want to have some kind of automatic migration script you can run between environments I understand this is might be the norm to a large extent at big companies. But there is also often a time where you have a cutoff and the cost of supporting backeward compat with all the additional code branch logic to handle can get to be a bit too much. I'm not sure I like the idea of never being able to do a drop column from automated scripts Even if it is decided that the only solution is to only ever do backward compatible migrations, I still think it would be worth it to standardize a component for a migration version tracking table. Or at least, it would be nice if someone could provide a script they use to run in CI for how to handle automatic migrations between multiple envirionments that works consistently over time
jamwt
jamwt•5d ago
yep, convex migrations are modeled how big systems do them, where there is no "instant" you can switch over, and so there is a time where you have "both kinds" of data running for some period (minutes, hours, days, etc) this does make them three steps instead of one this is "worth it" in production, but perhaps not in dev/ci -- which is why, normally, in these systems, we recommend you use seed data etc rather than prod data. also b/c eventually you can't use prod data b/c of PII/regulation/security etc in dev/CI, we recommend the way to migrate is: blow away your data 😄 if you want the codebase to always treat to the value as the new type, you can do that with a "lazy migration" ORM type thing we should ship an example of that to go along with the migration component if your migration is mostly adding a field, @Michal Srb is right that ents already gives you something useful via it's default

Did you find this page helpful?