Pietro
Pietro15mo ago

AI Response Streaming Pattern and Costs

When implementing the AI response streaming example where each chunk gets patched to a message, on a loop, the number of function calls and consequently reads basically explodes. Do you have any plans to treat patches and reads affected as a different billing category? I think the mental model of doing it like this (via DB) is brilliant, but I’m concern I’ll be doing bad optimizations just to be more long term cost conscious. How are you/should I be thinking about this? Should I architect this type of stuff with a web socket instead and only persist once the stream is done? Thanks in advance advance. p
17 Replies
Indy
Indy15mo ago
I acknowledge that this is potentially an expensive pattern w.r.t. billing with the naive implementation. A couple things: 1. Under the hood it is actually using websockets. 2. To manage costs and keep the simple db based pattern you can "debounce". i.e. Not write on every time the stream updates but maybe every n seconds. We are thinking about how we can reduce costs in situations like this, especially if there will be a lot of cache hits. For example in a case where there are multiple users reading the same streaming output etc.
Pietro
PietroOP15mo ago
Thanks, here's my candid feedback on this (as someone that has contracted 7 figure/yr on data and compute products) As a enterprise buyer and architect I would push back heavy on charging per function vs per compute/i.o. time. Charging per invocation incentivises devs to be less modular, leading to worse architecture. I don't like pricing models that incentivise bad architecture.
I have negotiated away from it a few times. I also don't like pricing models that are too creative/hard to understand, though I get why you may like it. Its just not balanced for us buyers. If you really want to keep some price per function, then I'd say separate between an "externally called" function or entry point and "internal invocations" - functions calling other functions, including writing/reading. That said, It has to be rational and balanced. For example the value of writing 5 chunks of text to disk should in principle be the same price as saving the sum of all chunks if compute is not dramatically different, not 5x the price, but then you need to allow me to do incremental updates somehow. So, in all honesty, I'd advise you to simplify by focusing on charging compute time x performance plus a sanity checking small price for storage... like Snowflake. Then the more convenience, the more we will spend, I promise, have seen it many times. you could argue - but the price per function is so low - sure, but still its annoying to think about it, I'm afraid you will increase that price in the future because I've been burned, it makes me think of just going back to prisma+PG... you see my point? Hope you appreciate the honesty 🙏✌️ I do it with love, the product is truly awesome.
Indy
Indy15mo ago
Excellent feedback! Thank you so much for giving it! We are still early in our journey of defining our plans so this is a great time to hear this. We are definitely thinking about what our pricing incentivizes. And definitely agree that pricing shouldn't lead to bad architecture and our incentives as should align. As we learn and gather feedback we'll adjust our various plans accordingly. I'll cc in @Jamie our CEO who's been thinking deeply about this recently. He's on vacation right now so it'll take a bit of time before he responds. Thanks again!
jamwt
jamwt15mo ago
hey! there will be a "business plan" that prices through compute + etc to align performance optimization and cost savings more transparently the pro plan we kept simple just b/c really detailed price lists are often a turn off in early products. and no one has any scale really to speak of when they're just starting. the pro plan is mostly about teams in development where they're not exceeding the built-in resources anyway for teams scaling, they will want the business plan, which has a more complex but more pass-through pricing in my experience, the biggest issue we run into right now is people optimizing away expecting things from the price sheet before they have a problem. i agree this is an issue It’s possible the existence of this kind of plan for them to see how scaling works will reassure them. because we 100% want no one to think about "architecting away" from the price sheet. the obvious way to do things on convex is actually going to be the cheapest. but we need people to be comfortable doing it that way so we can take advantage of the optimization oportunities we've designed the platform to unlock
I'm afraid you will increase that price in the future because I've been burned,
yep, I think this is the core of it it will take time to earn trust, but realistically our incentives are aligned. if we bait and switch via pricing, we'll lose all our accounts and credibility. the game would be over very fast for us. the real goal it would actually be very, very difficult to architect your app to work as well as cheaply as just using convex and letting us optimize the particular shape of the platform you're using, which is our core competency one clarification that might be necessary here:
between an "externally called" function or entry point and "internal invocations" - functions calling other functions, including writing/reading.
if internal means plain ol' JS/TS function calling each other, we don't charge for this. we only charge for the edge calls of "convex functions", aka query / mutation / action
Pietro
PietroOP15mo ago
Awesome, appreciate your feedback. I'll keep you posted as my thoughts about this evolve too. The main fear for now is that I'd love to do this AI streams AND Algolia style typeaheads, but I'll burn through quotas way too fast. the archetype here is many very simple calls (simple-high-count), as oposed to few very heavy calls (complex-low-count). Seems like simple-high-count is the 10x pricey item which is not aligned to value. One related additional suggestion re cost analytics: It will be important for me to be able to connect Convex spend to individual Users, so would be great to have a "First party" way to slice my consumption per (Clerk) userId , orgId and potentially other tags such as Feature. I know you're working on RLS so this might be a nice. Could not find this on the call logs...
jamwt
jamwt15mo ago
Re: simple calls, yeah I get that. Agree. That’s because we’re sort of assuming an average right now of (request * runtime/mem * cache hit rate). For some apps, our average is actually generous. And for some it’s stingy… especially over time as the customer dev team masters convex and optimizes their project. Each of those knobs will be exposed in a plan soon to give that control back to sites at scale worrying about scaling costs. One thing we’ve been blow away by is how much work it is to roll out high quality plans, especially with usage-based pricing. So we’re going to remedy this when we can, but it will take a few months. For some projects at scale with e,g high cache but rates, for now I chat with the customer and work out some discounting on the usage based portion of their bill. The whole clerk user attribution thing, possible we’ll be able to help solve this with log streams, where you can annotate the function calls. The log stream for a function invocation includes resource information for each call. Good suggestion.
Pietro
PietroOP15mo ago
👍 Makes sense. Yah, agree it’s complex Your client is also thinking about how to charge stuff down to client so it trickles down 😂
jamwt
jamwt15mo ago
Yep
Pietro
PietroOP15mo ago
While I have you, read somewhere ur adding substring where operator, is there ETA for it? 🙏
jamwt
jamwt15mo ago
We just rolled out prefix and fuzzy matching in our search indexing. Maybe that was it?
Pietro
PietroOP15mo ago
Aha that’s it thanks will check
jamwt
jamwt15mo ago
Convex News
Announcing Convex 1.7
Happy Holidays folks! We wanted to get one last feature release out before the year ended. Highlights: * Full text search now does prefix and fuzzy search. There is no new API, just upgrade to 1.7 for the new behavior. * Improved database snapshot format. You can now backup and restore
Pietro
PietroOP15mo ago
Totally missed the release
ian
ian10mo ago
@Pietro we just released http response streaming, so you can stream back LLM responses and only periodically update the DB, to help with your bandwidth concerns: https://news.convex.dev/announcing-convex-1-12/
Convex News
Announcing Convex 1.12
We’ve had a busy month, and we have a bunch of different improvements to share! Support for Svelte, Bun, and Vue! We have a few more logos under our quickstarts section – we've added guides for Svelte, Bun, and Vue including our first community-maintained client library! HTTP action response streaming
Pietro
PietroOP10mo ago
Super, thanks
Matt Luo
Matt Luo10mo ago
Hi @Pietro , did you keep your streaming pattern from your original post? One significant change on May 15 was the stopping of rounding the database bandwidth to the nearest 1 KB. I also prototyped the streaming pattern you described. I didn't change any code and did a side by side comparison of pre May 15 bandwidth usage versus today. It was a big difference! I think this change points to the what some of the Convex staff were saying about incentive alignment
Pietro
PietroOP9mo ago
Spot on 👌

Did you find this page helpful?