Better approach on getting streaming response
Hello Convex community,
I have been following the official Convex tutorial methods to update the streaming output of LLMs, which involves a number of internal mutations whenever an action retrieves new streaming results.
While this method is easy to implement and appears as streaming from the user's perspective, I believe it consumes a significant amount of database bandwidth and is somewhat slower than having a direct streaming response.
Are there any plans or guides for returning streaming responses in Convex actions?
3 Replies
We've been thinking about this pattern and it's potential cost ramifications. You can see a recent discussion here: https://discord.com/channels/1019350475847499849/1019350478817079338/1187807588092481576
That said the simple short term answer is to debounce writing out the response to the db. So only write an update to the db every N milliseconds.
Debounce sounds awesome idea. Thanks Indy!
aside: limiting to writing on newlines, periods, etc. could be a prettier debounce. Or just word boundaries. I'm personally not a fan of partial words streaming in. In general streaming text is so hard to read that I'm not sure it's much better than a spinner, if it moves the earlier text