Embeddings text-embedding-3-small vs text-embedding-ada-002
Hi, i just deleted all my embeddings and re parse everything using the cheaper text-embedding-3-small, but after the change my match rate went from 0.7 - 0.8 to 0.2 using vectorSearch with the same docuyments and same queries. The only thing i changed was the model.
I'm using the same model to get embeddings for the documents and the query!
What am i missing ? π
Regards!
4 Replies
I'd play around with some sample data - fetch a few samples, maybe just using
fetch
in a node shell and manually validate. The dot product works for similarity if they're normalized. It's possible the model doesn't do as good of a job for your semantic space. If it's something Convex-specific I'd be surprised, but let us know if that's what you find.Gist with simple fetch-based embedding API: https://gist.github.com/ianmacartney/53dafa51d37469534846105e39d99a25
Gist
Implementation of chat completions and embeddings for any OpenAI-co...
Implementation of chat completions and embeddings for any OpenAI-compliant services, using browser fetch and no imports/dependencies - llm.ts
And a post on embeddings in general I wrote a year ago: https://stack.convex.dev/the-magic-of-embeddings
The Magic of Embeddings
Embeddings, why theyβre useful, and how we can store and use them in Convex.
Thanks Ian, i was playing around, so far i can't find anything different other than the actual model..
actually i replaced my code with an example coming from convex with the same results, i'll play arround a bit more