ian•13mo ago

Llm.ts wrapper

I have an llm.ts file that just thinly wraps the http API with some nice types, retries, batching, etc

8 Replies

I’m intrigued, would you mind sharing this? Have you built somewhat complex use cases with this (e.g. agentic workflows)?

ianOP•13mo ago

here's one version of it in AI Town: https://github.com/a16z-infra/ai-town/blob/main/convex/util/llm.ts Examples doing some RAG: https://github.com/a16z-infra/ai-town/blob/08e3f419ba3f20ce46c63f8157a0ad223f0261d0/convex/agent/memory.ts#L325 https://github.com/a16z-infra/ai-town/blob/08e3f419ba3f20ce46c63f8157a0ad223f0261d0/convex/agent/conversation.ts#L13 AI Town characters are agents at the end of the day, but that whole control flow is done with regular code, not an AI-specific framework I prefer to work with prompts and data streams directly when I can.

ianOP•13mo ago

Here's another: https://github.com/get-convex/llama-farm-chat/blob/main/shared/llm.ts

GitHub

llama-farm-chat/shared/llm.ts at main · get-convex/llama-farm-chat

Use locally-hosted LLMs to power your cloud-hosted webapp - get-convex/llama-farm-chat

ampp•13mo ago

Im curious about the streams. I'm such a efficiency geek, i want redesign the streaming part to only send the update, right now the code sends the entire pagination page each update. I'm curious if you had thought about this or how you would do it. It seems more straight forward from the db side but i'm thinking its more work on the rendering nextjs side...

ianOP•12mo ago

like this? https://stack.convex.dev/ai-chat-with-http-streaming

AI Chat with HTTP Streaming

By leveraging HTTP actions with streaming, this chat app balances real-time responsiveness with efficient bandwidth usage. Users receive character-by-...

ampp•12mo ago

Not quite, i was thinking of breaking the message up to message fragments so each fragment is inserted into the table. Then when it completes the message table is updated. Just when i watch axiom with any of these message systems you are usually sending the entire pagination set worth of data each re-render so 10 messages x like 60 updates. That isn't nice on the bandwidth counter.

ianOP•12mo ago

With the linked article, you could stream to the client, then only write the message once to the database at the end, so one client would get streamed results and others would see it all at once (or chunked by sentence, etc)

ampp•12mo ago

ah.. yeah i guess i need to look much closer, i was thinking of this in the context of work stealing with efficient streaming so i probably have to be a bit more creative. Thanks for mentioning that.

Llm.ts wrapper

Did you find this page helpful?