We were previously setting service_tier: “auto” to hit the flex tier capacity first and then fallback to the on demand tier capacity. It looks like we’re not hitting the flex tier at all anymore, and I notice the docs around service tiers have changed and there’s a new performance tier. Is this an intentional change, or some regression on Groq’s end? Do we need to make a flex request first and then fall back to on demand manually?
If you want to easily burst beyond your limits, you can pass
service_tier=autoin order to use your performance tier limits if they are avaiable, and burst into on_demand. This is a great way to balance perforamcne and costs.
Yeah, it looks like “auto” uses performance mode first and falls back to on_demand, whereas before it would use flex mode first and fall back to on_demand, so this kinda a breaking change to this param
Auto mode has always used on_demand mode first and then flex as fallback; here’s a bit more details, but agreed we’ve never laid this out in docs our docs.
(reference: Service Tiers - GroqDocs)
1 Like