So I have my app that is using Grok for multiple models. Some of the times I have to do back and forth queries. One good example, like serially in a chain. One good example is, for example, speech-to-text transcription. and then once I speak, my app will hit grok and then will return me the transcription and then with that transcription, I have to again hit the grok llm to give me a response for that particular question and then that response, I will send it to grok again for TTS text to speech to get me the speech file and then read it out loud. So I have to do this back and forth, sometimes even more than that. right so i’m thinking if we can reduce that latency i think currently it’s 15 milliseconds on average uh for a round trip um it’s still not much it’s only like for three times it’s only four to five milliseconds um but uh yeah but i was just trying to go for ultra low latency you know like hardcore so i was thinking if uh if i can maybe put my uh like run my app in like near your servers like in a server that is near where you posted so that i can shave off that extra uh that will be only one round trip time uh the 15 seconds
We don’t have co-location yet unless for enterprise customers, sorry — but this is a kind of workflow we’re familiar with and even run ourselves, and we’re exploring how to reduce network hops.
(Most of the slowness comes from network hops!)
I mean you don’t have to host it in your, I don’t know if you guys have your own data center or not. Like if your data center is in a particular location then I can figure out some AWS zone that is near your location. I mean ideal situation is like whichever data center that you’re in like if I can get some OEPS there that would be the ideal case but if you don’t want to like engage in that way even if you can give your location I can figure out some data centers in that location and I can host mine there in a VPS. But the next best thing is like, if you can tell in which location your data center is, I can figure out some AWS zone that is closer to that, or some other, you know, data center that is closer to that and I can host there
Ah got it — we DO have a way to specify “use this location / data center for inference” but again, that’s in the Enterprise plan. Our router handles all incoming calls depending on data center load etc. so the closest data center isn’t guaranteed to be the fastest
If you’re curious about Enterprise access, you could fill in this: Enterprise Access | Groq is fast, low cost inference.