Prompt Caching

yawnxyz · October 27, 2025, 6:03pm

Hi!

We try our best to maximize cache hits, but caching isn’t guaranteed on subsequent requests due to our internal routing (which minimizes latency). This is especially true for smaller models, since there are more instances, and caching isn’t shared between instances, so it’s likely you will hit a different instance between requests.

We’re constantly working on improving the cache hit rate, and we appreciate your feedback!

Topic		Replies	Views
Qwen3 Coder & Qwen3 235B A22B Thinking 2507 Feature Requests	9	426	December 31, 2025
OpenAI's gpt-oss now available, plus built-in tools and responses API! Announcements	0	357	August 5, 2025
Qwen3-Coder-30B-A3B Feature Requests	2	178	August 8, 2025
tricks to avoid hitting groq rate limits Build with Groq	0	59	February 26, 2026
Endpoint for usage/cost data Feature Requests	0	65	January 14, 2026

Prompt Caching

Related topics