Moonshotai/kimi-k2-instruct-0905 excessive (cache-) token usage

kajdo · November 7, 2025, 11:19am

Hi,

i love to be able to use kimi-k2 on groq. the speed is fantastic!

I am using kimi-k2 via opencode and it works fine from functional perspective. also the speed is considerably better than direct calls to moonshot.ai.

But there is a huge problem i face. When i started to use it on groq i was surprised about the costs but haven’t had the chance to check. I was assuming, that there is something wrong with the caching and didn’t give it a second look.

I recently found the “activity” tab on the usage page of groq’s dashboard and now i can see my main issue.

While doing a rather small change on my project - the cached token count exploded (non-cached tokens ~960k vs. cached tokens ~10.3Mio)

compared with using kimi-k2 on moonshot.ai i see a completely different token usage. Working with moonshot’s api, i use far less tokens even if i do way bigger changes.

This might be an opencode issue - but I wanted to ask if somebody had the same issue and might have a tip for me. I do use the same config/settings on opencode for moonshot.ai and groq - so my immediate thought was that the caching on groq might do something wrong tbh

yawnxyz · November 7, 2025, 10:14pm

Hi, I’m glad that kimi on Groq is working well for opencode!

For cached tokens: that’s a lot of cached tokens! They should not count against the rate limit, and should lower your costs and latency as well.

I’m a bit confused - are you saying you’ve seen increase in Groq costs?

Topic		Replies	Views
Groq overcharging by 10x Forum	1	138	September 9, 2025
429 Rate limits with a single tool call Forum	1	109	August 19, 2025
Kimi K2 Free Tier Limits Forum	0	281	January 17, 2026
Groq kimi k2 tool call issues Forum	6	268	July 21, 2025
Kimi K2 now on Groq Announcements	1	229	August 20, 2025

Moonshotai/kimi-k2-instruct-0905 excessive (cache-) token usage

Related topics