Dhruv Kumar Jha•14mo ago

Getting Best Representation Vectors - Summarizing Large Documents

I am following one of the examples listed here https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/5%20Levels%20Of%20Summarization%20-%20Novice%20To%20Expert.ipynb "Level 4: Best Representation Vectors - Summarize an entire book" It's about summarizing large documents/books etc and while there are lot of methods, I want to test "Level 4" I am stuck at steps 4 & 5 Step 4 => Cluster the vectors to see which are similar to each other and likely talk about the same parts of the book Step 5 => Pick embeddings that represent the cluster the most (method: closest to each cluster centroid) So assuming I have already loaded all the docks, splitted them into chunks and created embeddings for them. Reading the docs, I know how to query the vector database but I don't know how to cluster the vectors and how to pick vectors closest to cluster centroid Can anyone help me with this? Are there any docs related to this? do we have any alternatives for node js/convex?

GitHub

langchain-tutorials/data_generation/5 Levels Of Summarization - Nov...

Overview and tutorial of the LangChain Library. Contribute to gkamradt/langchain-tutorials development by creating an account on GitHub.

1 Reply

Michal Srb•14mo ago

Try googling “kmeans js”? I see a few results.

Getting Best Representation Vectors - Summarizing Large Documents

Did you find this page helpful?