ian
ian7mo ago

Llm.ts wrapper

I have an llm.ts file that just thinly wraps the http API with some nice types, retries, batching, etc
8 Replies
David Alonso
David Alonso7mo ago
I’m intrigued, would you mind sharing this? Have you built somewhat complex use cases with this (e.g. agentic workflows)?
ian
ianOP7mo ago
here's one version of it in AI Town: https://github.com/a16z-infra/ai-town/blob/main/convex/util/llm.ts Examples doing some RAG: https://github.com/a16z-infra/ai-town/blob/08e3f419ba3f20ce46c63f8157a0ad223f0261d0/convex/agent/memory.ts#L325 https://github.com/a16z-infra/ai-town/blob/08e3f419ba3f20ce46c63f8157a0ad223f0261d0/convex/agent/conversation.ts#L13 AI Town characters are agents at the end of the day, but that whole control flow is done with regular code, not an AI-specific framework I prefer to work with prompts and data streams directly when I can.
ian
ianOP7mo ago
GitHub
llama-farm-chat/shared/llm.ts at main · get-convex/llama-farm-chat
Use locally-hosted LLMs to power your cloud-hosted webapp - get-convex/llama-farm-chat
ampp
ampp7mo ago
Im curious about the streams. I'm such a efficiency geek, i want redesign the streaming part to only send the update, right now the code sends the entire pagination page each update. I'm curious if you had thought about this or how you would do it. It seems more straight forward from the db side but i'm thinking its more work on the rendering nextjs side...
ian
ianOP6mo ago
AI Chat with HTTP Streaming
By leveraging HTTP actions with streaming, this chat app balances real-time responsiveness with efficient bandwidth usage. Users receive character-by-...
ampp
ampp6mo ago
Not quite, i was thinking of breaking the message up to message fragments so each fragment is inserted into the table. Then when it completes the message table is updated. Just when i watch axiom with any of these message systems you are usually sending the entire pagination set worth of data each re-render so 10 messages x like 60 updates. That isn't nice on the bandwidth counter.
ian
ianOP6mo ago
With the linked article, you could stream to the client, then only write the message once to the database at the end, so one client would get streamed results and others would see it all at once (or chunked by sentence, etc)
ampp
ampp6mo ago
ah.. yeah i guess i need to look much closer, i was thinking of this in the context of work stealing with efficient streaming so i probably have to be a bit more creative. Thanks for mentioning that.

Did you find this page helpful?