Migrating data to Convex

Is there a recommended pattern on how to migrate from faunadb to convex ? Maybe airbyte could help?

23 Replies

james•3y ago

yes the recommended way to migrate data into convex is via airbyte ingress. it looks like fauna supports airbyte egress we have a destination connector for airbyte which will allow streaming data in but it's currently in review. @Emma can give more context here but that will be ready ~very soon

james•3y ago

in the meantime for small amounts of data we have csv/json import from file https://docs.convex.dev/using/cli#import-a-file-into-convex but that's likely not sufficient for your needs if you have a large dataset

Convex CLI | Convex Developer Hub

The Convex command-line interface (CLI) is your interface for managing Convex

Emma•3y ago

yep! the airbyte destination connector PR is under review here https://github.com/airbytehq/airbyte/pull/21287 hopefully the airbyte folks are able to get to it soon!

GitHub

🎉 New Destination: Convex by emmaling27 · Pull Request #21287 · air...

What I work at Convex, where we're building a platform for building apps. We already have a Convex source connector, and we'd like to add a Convex destination connector. This PR adds a new ...

Gorka CesiumOP•3y ago

very cool.. currently my fauna dbs were using graphql so i have some nested objects and references and they all use Fauna IDs. I wonder how to convert those IDs into Convex IDs my databases are a few thousand records on the largest collections.. so probably i can wing it with csv

james•3y ago

the ID translation is a slightly tricky one unfortunately. convex will assign its own document IDs on insert and your fauna IDs will just be a field in each document but not the primary key. this will "work" especially if you add an index on the legacy fauna ID in your schema, but is far from ideal, especially when you start adding new rows that don't have a legacy fauna ID field in them eventually we could write some kind of fauna-specific import that rewrites your foreign key references to point to the convex IDs. in the meantime it would certainly be possible to for you to write a migration job as a mutation that changes the references accordingly. we're adding support for calling functions directly from the dashboard soon but in the meantime you can have a button that calls a function to do the migration i believe @ian is planning on writing a stack.convex.dev article sometime about writing a mutation like this. apologies you have some hoops to jump through in the meantime - it's definitely on our roadmap to streamline imports and schema migrations just of this type happy to talk through the individual steps to do that migration in the meantime: would look something like writing a job that fetches a batch of rows that still have the old type of reference, updating them to write back the new schema with the respective convex ID, and marking the rows as updated so you can run the job again to fetch the next batch and migrate them but just to acknowledge three features that would help a lot here: 1. ability to define private mutations and call them from the cli or dashboard (this is coming!) 2. native support for schema migrations or a template for doing so (also coming soon) 3. explicit fauna-to-convex converter (not yet prioritized but something for us to consider)

Gorka CesiumOP•3y ago

Thanks for the detailed advice, the fauna to convex converter would save me a lot of time. I have several databases to migrate

ian•3y ago

That sounds like a useful thing for us to have. Just to confirm, it'd be a one-time export from faunadb into Convex, right? Would you have some time to pair on this Monday afternoon / Tuesday?

Gorka CesiumOP•3y ago

yeah it is a one-time export. I do have time to pair up.. do you still want to pair up ? I could do Tuesday afternoon (Mexico City time) this is one of my entry barriers. The other barrier is all the work to rewrite the queries on my app. Specially the User Defined Functions written in FQL, and mutations. Fetch queries should be easier. I would start with my smallest app to see how it works, and then I would have to migrate two more databases

ian•3y ago

Yeah I have some time tomorrow. https://calendly.com/ian-from-convex/30min

Calendly

30 Minute Meeting - Ian Macartney

ian•3y ago

So you can get a head start, my rough approach is going to be: 1. Export to JSON: https://docs.fauna.com/fauna/v4/integrations/fdm/examples#export 2. Import each table into Convex with npx convex import https://docs.convex.dev/using/cli#import-a-file-into-convex 3. Look at the schema to find references from one document to another. Add indexes on the referenced columns. 4. Iterate the tables with references to other tables, and for each row, look up the convex ID of the document referenced and replace the field with the convex ID, with db.patch(doc._id, {fieldName: actualRef._id}). 5. Do (4) in batches of 100 by adding this to the end of a run: scheduler.runAfter(0, 'myMigrationMutation', lastDoc._creationTime) - passing the creation time of the last document so the next run can start where the last batch finished. 6. Start the migration from the dashboard, by popping open the developer tools console and running convexFunctions.myMigrationMutation(0)

Fauna documentation

Examples

Fauna is a flexible, developer-friendly, transactional database delivered to you as a secure, web-native API.

Convex CLI | Convex Developer Hub

The Convex command-line interface (CLI) is your interface for managing Convex

Gorka CesiumOP•3y ago

This is super helpful Booked

ian•3y ago

After looking at the faunadb format, it makes sense to either write a mutation to import the new data, or pre-process the fauna files. Here's a thought on a flow: 1. first pass just turns the .json files from faunadb into regular records. a. “@ref” just turns into string ids. track which fields go to which tables, save that info for later (maybe write a references.json file?) b. extract the “data” field c. add the fauna id & ts added as id and ts 2. npx convex import the new .jsonl files 3. auto-generate a schema 4. [optional] change the schema to have s.union(s.string(), s.id("referencedTable")) for all the places we inserted a string id (only to prevent typescript errors during migration) 5. add indexes to the schema for id 6. for each table with references, for each reference in the table, look up the object by .id and replace it with the object’s actual ._id - do this in batches, scheduling the next batch recursively 7. change the schema to have s.id("referencedTable") for all references instead of strings. We should have edited the json file and written a new jsonl file and imported that and used the dashboard to auto-generate the schema.ts rather than defining it manually, looking back. similarly just editing other files and importing them and cleaning up references later would have been easier

Gorka CesiumOP•3y ago

Not sure if i understand the references.json

ian•3y ago

I was thinking of just documenting something like:

{ receipts: { employee: 'employees' },
  hours: { extraReceipt: 'receipts' }, 
  ...any other references between tables
}

{ receipts: { employee: 'employees' },
  hours: { extraReceipt: 'receipts' }, 
  ...any other references between tables
}

So that later you can do something like:

// in node
const references = require('./references.json');
const updateIds = convex.mutation('updateIds');
for (const [sourceTable, refs] of Object.entries(references)) {
  await updateIds(sourceTable, refs, 0);
}

// in convex/updateIds.ts
export default mutation(({db, scheduler}, sourceTable, refs, afterTs) => {
  const batch = await db.query(sourceTable)
    .withIndex('by_creation_time', q => q.gt("_creationTime", afterTs))
    .take(100);
  for (const doc of batch) {
    for (const [field, targetTable] of Object.entries(refs)) {
      const reference = await db.query(targetTable)
        .withIndex("byId", q => q.eq('id', doc[field])).unique();
      await db.patch(doc._id, { [field]: reference._id });
    }
  }
  if (batch.length > 0) {
    const lastTs = batch[batch.length - 1]._creationTime;
    scheduler.runAfter(0, 'updateIds', sourceTable, refs, lastTs);
  }
});

// in node
const references = require('./references.json');
const updateIds = convex.mutation('updateIds');
for (const [sourceTable, refs] of Object.entries(references)) {
  await updateIds(sourceTable, refs, 0);
}

// in convex/updateIds.ts
export default mutation(({db, scheduler}, sourceTable, refs, afterTs) => {
  const batch = await db.query(sourceTable)
    .withIndex('by_creation_time', q => q.gt("_creationTime", afterTs))
    .take(100);
  for (const doc of batch) {
    for (const [field, targetTable] of Object.entries(refs)) {
      const reference = await db.query(targetTable)
        .withIndex("byId", q => q.eq('id', doc[field])).unique();
      await db.patch(doc._id, { [field]: reference._id });
    }
  }
  if (batch.length > 0) {
    const lastTs = batch[batch.length - 1]._creationTime;
    scheduler.runAfter(0, 'updateIds', sourceTable, refs, lastTs);
  }
});

If you want to support updating a list of IDs you'd need to store that info in the references.json file and iterate over them. but the approach would be the same btw we just fixed the import error - the error will be more verbose in the next release.

Gorka CesiumOP•3y ago

I added the employees table. So far so good. I see convenient to write scripts that reshape the fauna dump into the jsonl that will be ready for upload with convex import cli tool. That way I can take my time working on development. And when I’m done with the app changes then I can download the latest dump and reshape it with the scripts and push it to production in within a small timeframe I still need to do the receipts table. Which will have references pointing to the employee’s table I tried to upload a jsonl with the convex import but it failed the jsonl has lines that look like this (formated for easier readability)

{
  "rentAid": 0,
  "createdAt": "2022-02-23T18:23:57.693Z",
  "loan": 0,
  "week": "2022-W8",
  "despensa": 200,
  "extraOfficial": 0,
  "employee": { "$id": "employees|haQKOKC0WIgoCSH30KlLyw" },
  "commuteHourMonday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourTuesday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourWednesday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourThursday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourFriday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourSaturday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourSunday": { "hours": 0, "paymentPerHour": 30 },
  "extraHourMonday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourTuesday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourWednesday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourThursday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourFriday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourSaturday": { "hours": 0, "paymentPerHour": 60 },
  "extraHourSunday": { "hours": 0, "paymentPerHour": 60 },
  "otherPayments": { "concept": "", "payment": 0 }
}

{
  "rentAid": 0,
  "createdAt": "2022-02-23T18:23:57.693Z",
  "loan": 0,
  "week": "2022-W8",
  "despensa": 200,
  "extraOfficial": 0,
  "employee": { "$id": "employees|haQKOKC0WIgoCSH30KlLyw" },
  "commuteHourMonday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourTuesday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourWednesday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourThursday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourFriday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourSaturday": { "hours": 0, "paymentPerHour": 30 },
  "commuteHourSunday": { "hours": 0, "paymentPerHour": 30 },
  "extraHourMonday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourTuesday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourWednesday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourThursday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourFriday": { "hours": 1, "paymentPerHour": 60 },
  "extraHourSaturday": { "hours": 0, "paymentPerHour": 60 },
  "extraHourSunday": { "hours": 0, "paymentPerHour": 60 },
  "otherPayments": { "concept": "", "payment": 0 }
}

error messasge:

 pnpx convex import receipts ./data-backup/receipts-v4-with-employee.jsonl --format jsonLines
.../Library/pnpm/store/v3/tmp/dlx-67438  |  +12 +
.../Library/pnpm/store/v3/tmp/dlx-67438  | Progress: resolved 33, reused 12, downloaded 0, added 12, done
Error: Path ./data-backup/receipts-v4-with-employee.jsonl does not exist.
 ERROR  Command failed with exit code 1: convex import receipts ./data-backup/receipts-v4-with-employee.jsonl --format jsonLines

pnpm: Command failed with exit code 1: convex import receipts ./data-backup/receipts-v4-with-employee.jsonl --format jsonLines
    at makeError (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:23113:17)
    at handlePromise (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:23684:33)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Object.handler [as dlx] (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:209846:7)
    at async /Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217519:21
    at async main (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217488:34)
    at async runPnpm (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217718:5)
    at async /Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217710:7

 pnpx convex import receipts ./data-backup/receipts-v4-with-employee.jsonl --format jsonLines
.../Library/pnpm/store/v3/tmp/dlx-67438  |  +12 +
.../Library/pnpm/store/v3/tmp/dlx-67438  | Progress: resolved 33, reused 12, downloaded 0, added 12, done
Error: Path ./data-backup/receipts-v4-with-employee.jsonl does not exist.
 ERROR  Command failed with exit code 1: convex import receipts ./data-backup/receipts-v4-with-employee.jsonl --format jsonLines

pnpm: Command failed with exit code 1: convex import receipts ./data-backup/receipts-v4-with-employee.jsonl --format jsonLines
    at makeError (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:23113:17)
    at handlePromise (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:23684:33)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Object.handler [as dlx] (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:209846:7)
    at async /Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217519:21
    at async main (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217488:34)
    at async runPnpm (/Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217718:5)
    at async /Users/jc/Library/pnpm/global/5/.pnpm/pnpm@7.27.0/node_modules/pnpm/dist/pnpm.cjs:217710:7

I wonder if it is because i'm linking the employee like this "employee": { "$id": "employees|haQKOKC0WIgoCSH30KlLyw" }, @ian can you take a quick look?

ian•3y ago

It’s saying the file doesn’t exist- are you running it from the right directory?

Gorka CesiumOP•3y ago

ah yeah, sorry about that it worked! it linked the employee table

ian•3y ago

Woohoo!

Gorka CesiumOP•3y ago

thanks Just finished migrating my first app!

ian•3y ago

Hell yeah. Any tips / tricks for future ones?

Gorka CesiumOP•3y ago

First gotcha was to rename the fauna json export from .json to .jsonl Then try to model the data blobs as much as possible with local scripts. Then upload with convex cli import That way i can work on a branch while i rewrite the client side queries and mutations And when I’m ready for prod I export again from fauna to get the latest and run the local scripts, then import to prod with convex cli I will use this workflow for my other two apps. I also saw that i could simplify the schema because it is easier to handle data in convex vs FQL and fauna-graphql So that simplified and reduced the number of tables I’m happy to be able to ditch swr because it was tedious to mutate local state after each mutation I also started without using TS in convex but noticed that i can use it in convex without conflicting with my Next app that is in Rescript Looking forward to migrate the rest of my apps. They are much bigger The only bit of a bummer is not having data loaded in SSR like remix Also i need to see if convex will work with puppeteer for converting web pages into PDF Since it is done on a headless browser i dont know if it handles websocket

ian•3y ago

Very cool! For SSR, there are ways to get the data for the server-side render (e.g. using the ConvexHttpClient for server-side renders and ConvexReactClient on the client side), but it means you're fetching the data twice. We are working on a better end-to-end solution, so stay tuned I'm curious to hear how puppeteer works for you. For static data, you could use the ConvexHttpClient which doesn't use a websocket. It just doesn't have the react hooks & data reactivity.

Gorka CesiumOP•3y ago

I’ll let you know

Migrating data to Convex

Did you find this page helpful?