LLM / Streaming
We currently run a game mode where users interact with an LLM while simultaneously viewing the live-streamed responses of their opponent.
Currently, this system is managed via WebSockets, but I believe Convex could be a great fit. However, we have some concerns regarding concurrency and function limits, which I’ve outlined below:
Main concerns:
- Real-time performance: Due to the competitive nature of the game, we can’t afford to introduce noticeable buffering or delays in updates.
- Concurrency: We anticipate a high volume of tokens being streamed simultaneously. Each match involves two LLMs and two players, all potentially streaming tokens in real time—and we expect many matches to run concurrently.
Billing question:
If we use Convex to stream LLM outputs from the server to the client—sending each token individually—how would billing work? Specifically, would each token streamed represent a separate Function Call (i.e., one mutation/query per token)?
2 Replies
did you check the blog at all? There might be some relevant stuff there:
https://stack.convex.dev/ai-chat-with-http-streaming
https://stack.convex.dev/build-streaming-chat-app-with-persistent-text-streaming-component
https://stack.convex.dev/gpt-streaming-with-persistent-reactivity
https://www.convex.dev/components/persistent-text-streaming
yeah, saw the first two and the last one at least
this seems like what we needed, but I was a bit confused on how that related to "concurent connections"
Does each user count as concurrent connection or is this the case where open ws does not count
becuase if we have 3k-5k users , I want to make sure we wont run into issues
Thank you for any help in advance and I appologies for any mistakes, I'm still just learning 🙏