erquhart
erquhart2y ago

Making migrations faster

Migrations are probably my biggest pain point right now. Any patch operation against a table has to be done on 200 records per page, and each page takes ~30s. This seems orders of magnitude slower than it should be, so I'm pretty sure I'm doing something wrong. I was originally using the helper from your repo, but thought I'd try just writing a plain action that uses a paginated mutation so I don't have to manually page through and keep passing the cursor in myself (which the helper requires). I'm now trying to patch records concurrently, say in chunks of 20, but this seems to break convex and the instance stops responding for a while before what I can only assume is a reboot of some sort. So I guess the questions are: 1. are concurrent write operations supported? 2. are migrations supposed to take this long or am I holding it wrong?
23 Replies
ballingt
ballingt2y ago
This sounds pretty rough, could you share the instance name in DM? I agree this is orders of magnitude slower that it should be.
ian
ian2y ago
In this paginated mutation, is it just fetching the first N or so each time, or how are you paginating? Concurrent writes are supported, though if you're reading & writing the same rows as other mutations, you'll be contending with write conflicts (which auto-retry to a degree). There's also some nuances to scanning the same region over & over. e.g.:
const toPatch = await db
.query('mytable')
.withIndex('lastUpdated')
.take(100);
await asyncMap(toPatch,
async d => db.patch(
d._id,
{lastUpdated: Date.now()})
);
const toPatch = await db
.query('mytable')
.withIndex('lastUpdated')
.take(100);
await asyncMap(toPatch,
async d => db.patch(
d._id,
{lastUpdated: Date.now()})
);
This will have some performance implications differently from using our cursors, which is a bit complicated but I can explain if you're curious
ballingt
ballingt2y ago
@erquhart re "holding it wrong," you might stop doing these writes in parallel. We're working on making these more efficient, but doing them in parallel is not actually helping right now and it's hitting a corner case with timeouts that don't get surfaced correctly.
erquhart
erquhartOP2y ago
I'm doing them in other places too, and I'm betting that's what's causing the issues with my instance that surfaced last week
ballingt
ballingt2y ago
I think Ian's advice generally applies, my advice is specific to us looking at traffic on your instance right now
ian
ian2y ago
I was originally using the helper from your repo, but thought I'd try just writing a plain action that uses a paginated mutation so I don't have to manually page through and keep passing the cursor in myself (which the helper requires).
You might be interested in the action that comes along with the helper in the repo: https://github.com/get-convex/convex-helpers/blob/main/convex/lib/migrations.ts#L65 That one you can call like:
npx convex run lib.migrations:runMigration '{ "name": "myMigrations.foo", batchSize: 100 }'
npx convex run lib.migrations:runMigration '{ "name": "myMigrations.foo", batchSize: 100 }'
GitHub
convex-helpers/convex/lib/migrations.ts at main · get-convex/convex...
A collection of useful code to complement the official packages. - get-convex/convex-helpers
erquhart
erquhartOP2y ago
Yeah that's the one I've been using
ian
ian2y ago
... and it will run the batches without you having to pass in a cursor
erquhart
erquhartOP2y ago
hmm wait a minute, I don't remember passing a name argument though...
ian
ian2y ago
yeah there's the migration wrapper which makes something that can do just one batch, and a second dedicated action that does the looping for you it's at the bottom of the file
erquhart
erquhartOP2y ago
Ah so there is a looping handler, that's what was I was missing
ian
ian2y ago
Yeah I'm realizing I never added good comments to that file. Doing it now
erquhart
erquhartOP2y ago
Setting that aside for a sec, shouldn't this just work:
import { v } from 'convex/values'
import { internal } from '../_generated/api'
import { internalAction, internalMutation } from '../_generated/server'

export const setTransactionRecordTypesPage = internalMutation({
args: { cursor: v.union(v.null(), v.string()) },
handler: async ({ db }, { cursor }) => {
const result = await db
.query('transactions')
.paginate({ numItems: 100, cursor })
for (const transaction of result.page) {
await db.patch(transaction._id, { recordType: 'current' })
}
return result
},
})

export const setTransactionRecordTypes = internalAction({
args: {},
handler: async ({ runMutation }) => {
const patchTransactionRecordTypes = async (cursor: string | null) => {
const result = await runMutation(
internal.migrations.current.setTransactionRecordTypesPage,
{ cursor }
)
if (!result.isDone) {
await patchTransactionRecordTypes(result.continueCursor)
}
}
await patchTransactionRecordTypes(null)
},
})
import { v } from 'convex/values'
import { internal } from '../_generated/api'
import { internalAction, internalMutation } from '../_generated/server'

export const setTransactionRecordTypesPage = internalMutation({
args: { cursor: v.union(v.null(), v.string()) },
handler: async ({ db }, { cursor }) => {
const result = await db
.query('transactions')
.paginate({ numItems: 100, cursor })
for (const transaction of result.page) {
await db.patch(transaction._id, { recordType: 'current' })
}
return result
},
})

export const setTransactionRecordTypes = internalAction({
args: {},
handler: async ({ runMutation }) => {
const patchTransactionRecordTypes = async (cursor: string | null) => {
const result = await runMutation(
internal.migrations.current.setTransactionRecordTypesPage,
{ cursor }
)
if (!result.isDone) {
await patchTransactionRecordTypes(result.continueCursor)
}
}
await patchTransactionRecordTypes(null)
},
})
I'm running the action from the dashboard. It's not doing anything concurrently. But it's taking a really long time to run. Actually it just failed after 70 seconds:
failure Connection lost while action was in flight
failure Connection lost while action was in flight
The table has ~12k documents, so 120 pages of 100.
ian
ian2y ago
At a glance that does look right
erquhart
erquhartOP2y ago
It does manage to patch some records, but it eventually fails After the failure message, the action is still running, there's just a lost connection or something but the running action still does eventually fail
ian
ian2y ago
and this is doing mutations serially, even though the writes are batched in the mutation
erquhart
erquhartOP2y ago
yeah it's completely serial I dropped the concurrency
ian
ian2y ago
The failing action seems tough. @ballingt are you looking at exceptions / logs already?
ballingt
ballingt2y ago
Yeah Lee is taking a look
erquhart
erquhartOP2y ago
Here's the last few logs from the mutations this action ran:
7/30/2023, 8:23:38 PM
7775ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:24:01 PM
7782ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:24:24 PM
8176ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:24:47 PM
8142ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:25:11 PM
7885ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:25:34 PM
7828ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:25:58 PM
7849ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:23:38 PM
7775ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:24:01 PM
7782ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:24:24 PM
8176ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:24:47 PM
8142ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:25:11 PM
7885ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:25:34 PM
7828ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
7/30/2023, 8:25:58 PM
7849ms
success
Mutation
migrations/current:setTransactionRecordTypesPage
Note how it says it took ~8s for each page, but the logs are ~25s apart
ballingt
ballingt2y ago
@erquhart just send you a direct message, this is something we would expect to go much better so we're investigating and will let you know when we know more
erquhart
erquhartOP2y ago
Thank you!

Did you find this page helpful?