daun
daun12mo ago

Better approach on getting streaming response

Hello Convex community, I have been following the official Convex tutorial methods to update the streaming output of LLMs, which involves a number of internal mutations whenever an action retrieves new streaming results. While this method is easy to implement and appears as streaming from the user's perspective, I believe it consumes a significant amount of database bandwidth and is somewhat slower than having a direct streaming response. Are there any plans or guides for returning streaming responses in Convex actions?
3 Replies
Indy
Indy12mo ago
We've been thinking about this pattern and it's potential cost ramifications. You can see a recent discussion here: https://discord.com/channels/1019350475847499849/1019350478817079338/1187807588092481576 That said the simple short term answer is to debounce writing out the response to the db. So only write an update to the db every N milliseconds.
daun
daunOP12mo ago
Debounce sounds awesome idea. Thanks Indy!
ian
ian12mo ago
aside: limiting to writing on newlines, periods, etc. could be a prettier debounce. Or just word boundaries. I'm personally not a fan of partial words streaming in. In general streaming text is so hard to read that I'm not sure it's much better than a spinner, if it moves the earlier text

Did you find this page helpful?