Rodrigo-R•12mo ago

Embeddings text-embedding-3-small vs text-embedding-ada-002

Hi, i just deleted all my embeddings and re parse everything using the cheaper text-embedding-3-small, but after the change my match rate went from 0.7 - 0.8 to 0.2 using vectorSearch with the same docuyments and same queries. The only thing i changed was the model. I'm using the same model to get embeddings for the documents and the query! What am i missing ? 🙂 Regards!

4 Replies

ian•12mo ago

I'd play around with some sample data - fetch a few samples, maybe just using fetch in a node shell and manually validate. The dot product works for similarity if they're normalized. It's possible the model doesn't do as good of a job for your semantic space. If it's something Convex-specific I'd be surprised, but let us know if that's what you find.

ian•12mo ago

Gist with simple fetch-based embedding API: https://gist.github.com/ianmacartney/53dafa51d37469534846105e39d99a25

Gist

Implementation of chat completions and embeddings for any OpenAI-co...

Implementation of chat completions and embeddings for any OpenAI-compliant services, using browser fetch and no imports/dependencies - llm.ts

ian•12mo ago

And a post on embeddings in general I wrote a year ago: https://stack.convex.dev/the-magic-of-embeddings

The Magic of Embeddings

Embeddings, why they’re useful, and how we can store and use them in Convex.

Rodrigo-ROP•12mo ago

Thanks Ian, i was playing around, so far i can't find anything different other than the actual model.. actually i replaced my code with an example coming from convex with the same results, i'll play arround a bit more

Embeddings text-embedding-3-small vs text-embedding-ada-002

Did you find this page helpful?