convex schema + data migration
I think the issue if I understand it (i was having the same for a while) is that typically you have 3 or 4 environments (dev, canary, preview, prod,.....whatever) and you want them to be in sync. So typically at least what i'm used to with like postgres is you have CI automate the deploy between these envs so they are always in lock step. But convex doesnt have a standard database migration tool that effectively runs a set of migrations that you keep adding too that take you up from the default db state to the current state. I wrote my own my system that basically mimick dbmate (https://github.com/amacneil/dbmate) with my own migration tracking table that i am happy to share more about (it mostly works but because convex is so nicely typed does have some issues i'm working out)
I really think convex should standardize on this and have maybe even have a componetn for (i'm aware of the current migrations copnent but this is not that). This is a pretty critical feature that any relational db. Its very necessary for production workflows
migrationStatus: defineTable({
migrationId: v.string(),
status: v.union(
v.literal("pending"),
v.literal("running"),
v.literal("completed"),
v.literal("failed"),
v.literal("rolled_back") // Add rollback status
),
startedAt: v.optional(v.number()),
completedAt: v.optional(v.number()),
error: v.optional(v.string()),
// New optional fields for better tracking
name: v.optional(v.string()), // Migration name
description: v.optional(v.string()), // What it does
recordsProcessed: v.optional(v.number()),
recordsUpdated: v.optional(v.number()),
recordsErrored: v.optional(v.number()),
executionTimeMs: v.optional(v.number()),
environment: v.optional(v.string()), // Track which env ran it
}).index("migrationId", ["migrationId"]),
GitHub
GitHub - amacneil/dbmate: 🚀 A lightweight, framework-agnostic da...
🚀 A lightweight, framework-agnostic database migration tool. - amacneil/dbmate
9 Replies
On my part the issue is a bit simpler, but grounded in the same issue of lacking:
a standard database migration tool that effectively runs a set of migrations that you keep adding too that take you up from the default db state to the current stateLet's say on production, canary, and locally I have this deployed: and I want to add a mandatory (non-optional) "lastname" to link to the user's settings. because none of my documents have this lastname yet I have to make it optional, this makes sense, so locally I now have: I create a migration: Now I get stuck, I can't make the row non-optional until I have deployed and run the migration on both canary and production too. So throughout my code the lastname will continue being potentially null. When I was using postgres this wasn't a problem since at the time I used alembic which ran migrations of schema in lockstep with data update. and this migration kept living in the migrations folder. Having schema updates as part of migrations is of course not at all what convex is about though, which is why I was bringing this up asking if anyone had any clever solutions. @jamwt cc - created thread not to clog main chat
Thanks @Jonas yea I think the really simple way of expressing the issue is that convex doesnt have a versioned database migration tool, unlike mysql or postgres or whatever that have many (flywheel, alembic, dbmate, etc)
I think this is really critical convex have this for production systems. Maybe the advantage convex could have over others though is having a single supported tool by the convex maintainers themselves that everyone could point to. (there's a lot of choice fatigue in the postgres world imo for a bunch of tools that mostly do the same thing)
Personally i think components are the solution here along with functionality built into the cli (
convex run migrations). The componetn would basically create a central table like the one I showed
One problem i will say that's a challenge with convex and standard versioned migrations is that its type safety - which is otherwise such an awesome selling point - does actually make this kind of thing harder. But i'm sure its solvable I just haven't figured it out yet
Okay think aboutt his a bit more. I realize now the nature of how tricky this problem is. Basically what we are asking for is something that can run
This allows for easily automating migration scripts.
But here's the problem. Unlike postgres, where the schema is defined mutably/imperatively as as a series of transformations, convex schemas are defined statically and declaratively!!
In other words, there is no DROP COLUMN in convex because that's not how it works
I think we will have to have a different mental model for how to do this in convex. Probably gonna have to involve deleting previous migration files. This is tricky though....glad we are having the discussion this has been bugging me more than anythign since i've started using convex and I'd love to get to a consensus approach!When I was using postgres this wasn't a problem since at the time I used alembic which ran migrations of schema in lockstep with data update. and this migration kept living in the migrations folder. Having schema updates as part of migrations is of course not at all what convex is about though, which is why I was bringing this up asking if anyone had any clever solutions.What I was hinting at here yeah. It's a hard nut to crack.
Passing by:
Having field defaults helps the specific example @Jonas gave. One option is Ents: https://labs.convex.dev/convex-ents/schema#field-defaults
For the rest of the discussion, I think you're on the right track, in that Convex migrations are not like schema migrations in Postgres, but more like the data migrations done at bigger companies. These migrations can take a long real-world time (say weeks), during which bugs are ironed out, and the data is migrated. Having the schema be backwards compatible is the only way to allow concurrent development with the migrations in progress.
Thanks, will check them out!
Thanks @Michal Srb that ents looks really interseting
I guess this is where I've mostly landed, that you really just have to be eternally backward compatible with all your migrations if you want to have some kind of automatic migration script you can run between environments
I understand this is might be the norm to a large extent at big companies. But there is also often a time where you have a cutoff and the cost of supporting backeward compat with all the additional code branch logic to handle can get to be a bit too much. I'm not sure I like the idea of never being able to do a drop column from automated scripts
Even if it is decided that the only solution is to only ever do backward compatible migrations, I still think it would be worth it to standardize a component for a migration version tracking table. Or at least, it would be nice if someone could provide a script they use to run in CI for how to handle automatic migrations between multiple envirionments that works consistently over time
yep, convex migrations are modeled how big systems do them, where there is no "instant" you can switch over, and so there is a time where you have "both kinds" of data running for some period (minutes, hours, days, etc)
this does make them three steps instead of one
this is "worth it" in production, but perhaps not in dev/ci -- which is why, normally, in these systems, we recommend you use seed data etc rather than prod data. also b/c eventually you can't use prod data b/c of PII/regulation/security etc
in dev/CI, we recommend the way to migrate is: blow away your data 😄
if you want the codebase to always treat to the value as the new type, you can do that with a "lazy migration" ORM type thing
we should ship an example of that to go along with the migration component
if your migration is mostly adding a field, @Michal Srb is right that ents already gives you something useful via it's default
Thanks for the replies, really
And yeah ents seem to be the way in terms of minimizing a lot of this manual behavior. also takes care of things like cascade deletion.. which would help me with how I regularly forget to implement it myself in delete endpoints and make up some funky migration to delete all the garbage 😅 😂
I want to ask though since I'm not just playing around but using convex seriously for long-term projects. What's the plan with ents? to me from the reading I did on it, it sounds like the way you should be building with convex from the start, but with it not being pushed hard to the point I haven't even heard of it it ticks some flags.
Is it "the proper/future way" of doing convex (not that I care what proper is, mostly the future part, more that I need something that will be built on long term and become the standard way, switching ways of doing things is expensive and I have to justify these changes to my clients). Why isn't it being pushed as a replacement of the "old way" <-- excuse my ignorance haha
https://discord.com/channels/1019350475847499849/1280908737032360017/1333501733950984283ah found this. if that's still the case I'll probably hold off
Also thanks for ansswering @Jamie and @Michal Srb
I guess i'm okay with this overall. In short
1. Try to always keep backward compatibility
2. While i'm used to doing (1), there are times usually after old backward compat code has gotten really old and messy when you have to make a decision to deprectae stuff because the code becomes just too cumbersome. In those cases I guess we should just not rely on automated scripts for our envs and just carefully run them ourselves (drop column is just an example but there are many more)
3. I still think we (and convex) team should standardize on at least automatated data migration patterns. That migrations table I suggested above I think is in the ballpark. But its a very standard pattern to deploy via CI your migrations so that when you deploy to staging or canary or whatever before prod you know that you are getting consistent changes