Moonshotai/kimi-k2-instruct-0905 excessive (cache-) token usage

Hi,

i love to be able to use kimi-k2 on groq. the speed is fantastic!

I am using kimi-k2 via opencode and it works fine from functional perspective. also the speed is considerably better than direct calls to moonshot.ai.

But there is a huge problem i face. When i started to use it on groq i was surprised about the costs but haven’t had the chance to check. I was assuming, that there is something wrong with the caching and didn’t give it a second look.

I recently found the “activity” tab on the usage page of groq’s dashboard and now i can see my main issue.

While doing a rather small change on my project - the cached token count exploded (non-cached tokens ~960k vs. cached tokens ~10.3Mio)

compared with using kimi-k2 on moonshot.ai i see a completely different token usage. Working with moonshot’s api, i use far less tokens even if i do way bigger changes.

This might be an opencode issue - but I wanted to ask if somebody had the same issue and might have a tip for me. I do use the same config/settings on opencode for moonshot.ai and groq - so my immediate thought was that the caching on groq might do something wrong tbh

Hi, I’m glad that kimi on Groq is working well for opencode!

For cached tokens: that’s a lot of cached tokens! They should not count against the rate limit, and should lower your costs and latency as well.

I’m a bit confused - are you saying you’ve seen increase in Groq costs?