convex-elevenlabs

🖋️Component

A component for async speech-to-text transcription via ElevenLabs webhooks. Start a transcription from a URL or file upload, and handle the result in your webhook callback. Supports typed metadata so you can pass context through the request lifecycle:

const elevenlabs = new ElevenLabs<{ docId: string }>(components.elevenlabs);

// Start transcription (returns immediately)
await elevenlabs.startTranscription(ctx, {
  url: "https://...",
  modelId: "scribe_v1",
  options: { metadata: { docId: "123" } },
});

// Handle result in http.ts
elevenlabs.registerWebhook(http, {
  onComplete: async (ctx, result) => {
    // result.requestMetadata.docId is typed
  },
});

const elevenlabs = new ElevenLabs<{ docId: string }>(components.elevenlabs);

// Start transcription (returns immediately)
await elevenlabs.startTranscription(ctx, {
  url: "https://...",
  modelId: "scribe_v1",
  options: { metadata: { docId: "123" } },
});

// Handle result in http.ts
elevenlabs.registerWebhook(http, {
  onComplete: async (ctx, result) => {
    // result.requestMetadata.docId is typed
  },
});

Works well with

@convex-dev/workflow

@convex-dev/workflow

- pass an

eventId

eventId

in metadata and use

ctx.awaitEvent

ctx.awaitEvent

to pause a workflow until transcription completes. Full options for diarization, speaker detection, timestamps, entity detection, etc.

Why use webhooks? In short, because of the 600s action timeout. Long audio files (>2 hours) will often exceed the limit, so I had to build this functionality out in a few projects that use the service. Hope this is useful to anyone building with ElevenLabs - I will update it with more functionality such as real time streaming soon.

https://github.com/wantpinow/convex-elevenlabs

convex-elevenlabs

Similar Threads