Just built this. Using GPU cloud
Just built this. Using GPU cloud inference for the transcription+generation, so that's another 2 WebSockets. Convex is great for real time sync of messages between client and server.
3 Replies
How did you achieve this? Haven't been able to upgrade the HTTP connection to a websocket with the client ðŸ˜
STT/TTS models deployed via Python/Rust to RunCloud/Modal instances
I do not think HTTP Actions can be upgraded into WebSocket
you can use fly.io gpu cloud to deploy a python server/docker image and auto scale it based on network or compute usage
u'll have to email them for gpu access but they reply wth couple of hrs
they have a suspend state with < 300ms but for gpus its around 2s-5s on avg