Prompt Caching

Hi!

We try our best to maximize cache hits, but caching isn’t guaranteed on subsequent requests due to our internal routing (which minimizes latency). This is especially true for smaller models, since there are more instances, and caching isn’t shared between instances, so it’s likely you will hit a different instance between requests.

We’re constantly working on improving the cache hit rate, and we appreciate your feedback!