Alternatively would love to get help on
Alternatively, would love to get help on how to use transformer.js with convex to generate embeddings. I kept getting this strange error on convex when trying to use @xenova/transformers ๐งต
16 Replies
what i have:
import { pipeline } from '@xenova/transformers';
export async function fetchEmbeddingBatchLocal(
texts: string[],
): Promise<{ embeddings: number[][] }> {
const generateEmbedding = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
// Generate a vector using Transformers.js
const outputs = await Promise.all(
texts.map((text) =>
generateEmbedding(text, {
pooling: 'mean',
normalize: true,
}),
),
);
// Extract the embedding output
const embeddings: number[][] = outputs.map((output) => Array.from(output.data));
return {
embeddings,
};
}
export async function fetchEmbeddingLocal(text: string) {
const { embeddings } = await fetchEmbeddingBatchLocal([text]);
return { embedding: embeddings[0] };
}
Getting "Uncaught Error: no available backend found. ERR: [wasm] TypeError: (0 , i.cpus) is not a function"Hey @yokoli, looking into this right now! Will get back to you in a bit
Unfortunately, it doesn't look like we can support
@xenova/transformers
at this time ๐ฆ . At runtime, it depends on local onnx
files which our bundler is unable to bundle at push-time leading to this runtime issue.
We have an upcoming feature to support packages like this with dynamic runtime dependencies in our Node runtime in Convex 1.4 (coming soon). There is still, however, a 250MB unzipped limit for packages enforced by AWS Lambda, which is likely to be exceeded by heavy packages like this which include local ML model files.
Supporting larger packages is an active area of work for us, so hopefully we can support this in the near future.Thanks for looking into this @rkbh ! fwiw @xenova/transformers unpacked size seems to be 45mb?
onnx
files are in total ~110mb. So it's just shy of the 250mb limit.
In this case, curious why using node
runtime wouldn't work out of box?
@sujayakar would be amazing if we can get this approach working so we don't have to bother you to extend the vector db dimensions ๐ ๐ ๐
Thanks so much team!!!!@yokoli yep, the package itself is only 45MB, but the dependencies it pulls in makes your
node_modules
folder 254 MB (on OS X with @xenova/transformers=2.6.2
), which just passes the Lambda limit
The reason it doesn't work out of the box with node
is because our Node runtime uses a similar bundling process to the Convex runtime which can't pickup those onnx
by defaultah i see. and there's no way to tell it to pick it up by updating the vite.config?
Yep there is a way by marking this as an external package (https://docs.convex.dev/functions/bundling#specifying-external-packages) in your
convex.json
.
You would place this there:
You'll have to update to the newest Convex version 1.4.1
to use this feature btw.Bundling | Convex Developer Hub
Bundling is the process of gathering, optimizing and transpiling the JS/TS
But this will hit the 250 MB Lambda limit. I tried it locally earlier ๐ฆ
ahhhhhh got it ๐ข
thanks so much for looking into this though @rkbh
Thanks sm for mentioning this, seems like it would be really useful to support this package (and ones like it). Hopefully we'll be able to soon!!
and yeah if we can support transformer.js it ll unlock a plethora of AI use cases
ok another idea: what if convex hosts a small out of box embeddings model for people to call from their convex apps?
i imagine this could look like a convex function that only does
generateEmbedding(text, {
pooling: 'mean',
normalize: true,
This is something @Indy and @sujayakar were thinking about recently, I believe! I'll defer to them on this
We've brainstormed about hosting some models directly. Though, we've seen most folks just happily use remote hosted models. But I do know there is some fascinating work happening in small embeddable models and this could be fascinating product to build out. Going to have to noodle on it some more.
my pitch: we can unlock tons of latent demand by telling people "ai town takes $0 to run". can start small with embeddings model, and devs can use this embedding model for ai town or any other ai apps they build
one other idea @yokoli: could you use the setup you were using with llama2 but with a different model?
rather than using llama2's internal embedding, one idea would be to use an open embedding like https://huggingface.co/BAAI/bge-large-en-v1.5 (1024 dimensions). I wonder if it'd perform better on the memory retrieval stuff we do in ai town too since it's purpose trained for that.
unfortunately the Bert models (including the ones you sent above) don't yet work on Ollama for local inference ๐ฆ thats why I was looking into either a smaller LLM with embeddings OR transformer.js
ahh okay, makes sense.