Getting Best Representation Vectors - Summarizing Large Documents
I am following one of the examples listed here https://github.com/gkamradt/langchain-tutorials/blob/main/data_generation/5%20Levels%20Of%20Summarization%20-%20Novice%20To%20Expert.ipynb
"Level 4: Best Representation Vectors - Summarize an entire book"
It's about summarizing large documents/books etc and while there are lot of methods, I want to test "Level 4"
I am stuck at steps 4 & 5
Step 4 => Cluster the vectors to see which are similar to each other and likely talk about the same parts of the book
Step 5 => Pick embeddings that represent the cluster the most (method: closest to each cluster centroid)
So assuming I have already loaded all the docks, splitted them into chunks and created embeddings for them.
Reading the docs, I know how to query the vector database but I don't know how to cluster the vectors and how to pick vectors closest to cluster centroid
Can anyone help me with this? Are there any docs related to this? do we have any alternatives for node js/convex?
GitHub
langchain-tutorials/data_generation/5 Levels Of Summarization - Nov...
Overview and tutorial of the LangChain Library. Contribute to gkamradt/langchain-tutorials development by creating an account on GitHub.
1 Reply
Try googling “kmeans js”? I see a few results.