Groq API Bug Report: Missing finish_reason in Streaming Responses

Groq API Bug Report: Missing finish_reason in Streaming Responses

API: https://api.groq.com/openai/v1/chat/completions
Date: November 19, 2025
Severity: High (breaks OpenAI compatibility)
Status: Confirmed by SDK testing

Summary

The Groq API does not send finish_reason in streaming responses for certain models, violating OpenAI’s streaming specification and breaking OpenAI-compatible clients.

Confirmed by SDK Investigation

The Groq Go SDK team has confirmed this is an API issue, not an SDK bug:

  • :white_check_mark: SDK correctly handles finish_reason when API sends it
  • :white_check_mark: Test added proving SDK works correctly
  • :white_check_mark: Python SDK shows same behavior (API doesn’t send finish_reason)
  • :white_check_mark: Diagnostic tools confirm API behavior

Reference: GitHub - ZaguanLabs/groq-go: Unofficial Groq SDK in Go (see FINISH_REASON_ANALYSIS.md)

Affected Models

Confirmed affected models:

  • :white_check_mark: moonshotai/kimi-k2-instruct-0905
  • :white_check_mark: qwen/qwen3-32b (Qwen-Code)

Likely affects other models as well.

OpenAI Specification Requirement

According to OpenAI’s streaming specification, the final chunk must contain:

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion.chunk",
  "created": 1234567890,
  "model": "model-name",
  "choices": [{
    "index": 0,
    "delta": {},
    "finish_reason": "stop"  // ← REQUIRED
  }]
}

Followed by:

data: [DONE]

Actual Groq API Behavior

The Groq API sends:

// Last content chunk
{
  "id": "chatcmpl-xxx",
  "model": "moonshotai/kimi-k2-instruct-0905",
  "choices": [{
    "index": 0,
    "delta": {"content": "final word"},
    "finish_reason": ""  // ← Empty
  }]
}

// Empty chunk (no finish_reason)
{
  "id": "chatcmpl-xxx",
  "model": "",
  "choices": []  // ← No choices, no finish_reason
}

data: [DONE]

Missing: A chunk with finish_reason: "stop" before [DONE]

Production Evidence

Raw Logs from Zaguan CoreX

{"time":"2025-11-19T16:53:36.135620192+01:00","level":"DEBUG","msg":"Groq SDK stream chunk received","model":"moonshotai/kimi-k2-instruct-0905","choices_count":1,"has_usage":false}
{"time":"2025-11-19T16:53:36.142579114+01:00","level":"DEBUG","msg":"Groq SDK stream chunk received","model":"moonshotai/kimi-k2-instruct-0905","choices_count":1,"has_usage":false}
{"time":"2025-11-19T16:53:36.149212213+01:00","level":"DEBUG","msg":"Groq SDK stream chunk received","model":"moonshotai/kimi-k2-instruct-0905","choices_count":1,"has_usage":false}
{"time":"2025-11-19T16:53:36.370208389+01:00","level":"DEBUG","msg":"Groq SDK stream chunk received","model":"","choices_count":0,"has_usage":false}
{"time":"2025-11-19T16:53:36.370226223+01:00","level":"DEBUG","msg":"Skipping empty chunk (no choices, no usage)"}
{"time":"2025-11-19T16:53:36.371322536+01:00","level":"DEBUG","msg":"Skipping nil chunk from Groq SDK stream"}
{"time":"2025-11-19T16:53:36.371336872+01:00","level":"DEBUG","msg":"Stream ended without finish reason, sending final chunk"}

Note: No chunk with finish_reason ever received from API.

Impact

Broken Clients

  • :cross_mark: OpenWebUI: “API Error: Model stream ended without a finish reason”
  • :cross_mark: Langchain: May fail validation
  • :cross_mark: LlamaIndex: May fail validation
  • :cross_mark: Any OpenAI-compatible client: Expects finish_reason

Workarounds Required

All OpenAI-compatible proxies must implement workarounds:

// Zaguan CoreX workaround (v0.37.0-beta7)
sawFinishReason := false
for {
    chunk, err := stream.Next(ctx)
    if err != nil {
        if errors.Is(err, io.EOF) {
            if !sawFinishReason {
                // Synthesize finish_reason since API didn't send it
                outputChan <- ChatResponseChunk{
                    Model: modelName,
                    Choices: []Choice{{
                        Index:        0,
                        FinishReason: "stop",
                        Delta:        ChatMessage{},
                    }},
                }
            }
            return
        }
    }
    
    for _, choice := range chunk.Choices {
        if choice.FinishReason != "" {
            sawFinishReason = true
        }
    }
}

Reproduction

cURL Test

curl -X POST https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moonshotai/kimi-k2-instruct-0905",
    "messages": [{"role": "user", "content": "Say hello"}],
    "stream": true
  }' \
  --no-buffer

Expected: Final chunk with finish_reason: "stop" before [DONE]
Actual: Empty chunk, then [DONE], no finish_reason

Go SDK Test

cd groq-go/groq/examples/debug_finish_reason
go run main.go

Output: “Stream ended without finish_reason”

Comparison with OpenAI API

OpenAI Behavior (Correct)

data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":""}]}

data: {"choices":[{"delta":{"content":"!"},"finish_reason":""}]}

data: {"choices":[{"delta":{},"finish_reason":"stop"}]}  ← finish_reason sent

data: [DONE]

Groq Behavior (Incorrect)

data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":""}]}

data: {"choices":[{"delta":{"content":"!"},"finish_reason":""}]}

data: {"model":"","choices":[]}  ← No finish_reason

data: [DONE]

Expected Fix

The Groq API should send a final chunk before [DONE]:

{
  "id": "chatcmpl-xxx",
  "object": "chat.completion.chunk",
  "created": 1234567890,
  "model": "moonshotai/kimi-k2-instruct-0905",
  "choices": [{
    "index": 0,
    "delta": {},
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  }
}

This matches OpenAI’s behavior and satisfies the specification.

Finish Reason Values

According to OpenAI spec, finish_reason can be:

  • "stop" - Natural completion
  • "length" - Max tokens reached
  • "content_filter" - Content filtered
  • "tool_calls" - Function/tool call made

The API should send the appropriate value based on how the generation ended.

Testing Checklist

To verify the fix:

  • Test with moonshotai/kimi-k2-instruct-0905
  • Test with qwen/qwen3-32b
  • Test with llama-3.3-70b-versatile
  • Test with function calling (finish_reason: “tool_calls”)
  • Test with max_tokens limit (finish_reason: “length”)
  • Verify finish_reason comes BEFORE [DONE]
  • Verify usage data is included in final chunk
  • Test with both Go and Python SDKs

Related Documentation

Priority

Critical: This breaks OpenAI compatibility and requires workarounds in all client implementations.

Proposed Timeline

  1. Immediate: Acknowledge issue
  2. Short-term: Fix API to send finish_reason for affected models
  3. Long-term: Ensure all models send finish_reason correctly

Contact

Reported by: Zaguan CoreX team
SDK Confirmation: ZaguanLabs/groq-go maintainers
Production Impact: Affecting all OpenAI-compatible clients


Appendix: SDK Team Response

From the Groq Go SDK team investigation (November 19, 2025):

Conclusion: This is an API Issue, Not an SDK Bug :white_check_mark:

After thorough investigation, we’ve determined:

  1. The SDK correctly handles finish_reason when the API sends it
  2. The Groq API is not sending finish_reason for certain models
  3. The Python SDK has the same behavior

Why the SDK Should NOT Synthesize finish_reason

  • Transparency: The SDK should return exactly what the API sends
  • Debugging: Users need to know the actual API behavior
  • Correctness: The SDK can’t know what finish_reason should be

We agree with this assessment. The fix must be at the API level.


Thank you for your attention to OpenAI compatibility! This fix will benefit all Groq API users.

Hi! Thank you for reporting this. I’m trying to repro your curl

curl --request POST \
    --url https://api.groq.com/openai/v1/chat/completions \
    --header 'authorization: Bearer MYKEY' \
    --header 'content-type: application/json' \
    --data '{
    "messages": [
        {
            "role": "user",
            "content": "say hello"
        }
    ],
    "model": "moonshotai/kimi-k2-instruct-0905",
    "stream": true
}'

and I consistently get


data: {"id":"chatcmpl-50b2f9bb-1e48-4d78-8837-409f01270736","object":"chat.completion.chunk","created":1763578973,"model":"moonshotai/kimi-k2-instruct-0905","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-50b2f9bb-1e48-4d78-8837-409f01270736","object":"chat.completion.chunk","created":1763578973,"model":"moonshotai/kimi-k2-instruct-0905","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"x_groq":{"id":"req_01kaeqz36fekwa2fzrthrerz2x","usage":{"queue_time":0.02321992,"prompt_tokens":28,"prompt_time":0.009536857,"completion_tokens":3,"completion_time":0.006776633,"total_tokens":31,"total_time":0.01631349}},"usage":{"queue_time":0.02321992,"prompt_tokens":28,"prompt_time":0.009536857,"completion_tokens":3,"completion_time":0.006776633,"total_tokens":31,"total_time":0.01631349}}

where I DO seem to get "finish_reason":"stop"

could you please run your suite again and test? I’m not sure why you didn’t get finish_reason, but I’m having a hard time reproducing

I’m using Qwen-Code with Kimi K2 and when it tries to call a tool it just stops when connecting through Zaguan.

I’ve also tried to connect directly to api.groq.com and am receiving an error there as well.

are you running the same curl as I did, but you’re getting no finish_reason still? can you run my curl now and paste the output?
(I’m trying to cut out SDKs so I can get to the root cause)

The issue is with streams and with several tool calls in a row. A single call works great and I’ve now tested both your curl command and a separate culr command with a tool call.
The responses are equal in both instances.

This does not, however, tell us why Qwen-Code just stops mid-call.

% curl --request POST \                                                                                                                               25-11-19 - 20:33:36
    --url https://api.groq.com/openai/v1/chat/completions \
    --header "authorization: Bearer $GROQ_API_KEY" \
    --header 'content-type: application/json' \
    --data '{
    "messages": [
        {
            "role": "user",
            "content": "say hello"
        }
    ],
    "model": "moonshotai/kimi-k2-instruct-0905",
    "stream": true
}'
data: {"id":"chatcmpl-542678e8-1eee-4fea-8726-bd1b64d53555","object":"chat.completion.chunk","created":1763580827,"model":"moonshotai/kimi-k2-instruct-0905","system_fingerprint":"fp_3312304636","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}],"x_groq":{"id":"req_01kaesqnjqfpz8fec1mgf75etq","seed":24420746}}

data: {"id":"chatcmpl-542678e8-1eee-4fea-8726-bd1b64d53555","object":"chat.completion.chunk","created":1763580827,"model":"moonshotai/kimi-k2-instruct-0905","system_fingerprint":"fp_3312304636","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-542678e8-1eee-4fea-8726-bd1b64d53555","object":"chat.completion.chunk","created":1763580827,"model":"moonshotai/kimi-k2-instruct-0905","system_fingerprint":"fp_3312304636","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-542678e8-1eee-4fea-8726-bd1b64d53555","object":"chat.completion.chunk","created":1763580827,"model":"moonshotai/kimi-k2-instruct-0905","system_fingerprint":"fp_3312304636","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"x_groq":{"id":"req_01kaesqnjqfpz8fec1mgf75etq","usage":{"queue_time":0.162589862,"prompt_tokens":28,"prompt_time":0.009505263,"completion_tokens":3,"completion_time":2.7e-7,"total_tokens":31,"total_time":0.009505533}},"usage":{"queue_time":0.162589862,"prompt_tokens":28,"prompt_time":0.009505263,"completion_tokens":3,"completion_time":2.7e-7,"total_tokens":31,"total_time":0.009505533}}

data: [DONE]

Your example still displays finish_reason: stop though — are you able to give me a reproducible example I can use to find the root cause?