Let’s consider the following example use case:
- We want to ask a model that supports tool use the following question: “What is the weather like today in the largest city in Japan that doesn’t contain a y? Make sure to explain your reasoning in detail before calling any tools.”
- We have two tools:
- get_weather, which takes two parameters: location and date, and returns the corresponding weather forecast.
- get_current_date, which returns today’s date.
I’d like the model to first determine the largest city in Japan without a ‘y’ - hence the chain-of-thought instruction in the prompt. Without it, it often decides to go with Tokyo. Then, it should call the tool to get the current date. Finally, when it has determined the date, it would be able to call the weather tool.
In general, I’d like to have a loop of text generation, then a tool call, then perhaps more text generation, then another tool call, etc. And I'd like to be able to follow through and inspect what the model was thinking throughout.
The problem I’m having is, when I make an API call with the prompt and the two tools, the model generates some tokens (presumably doing the chain of thought), then returns a tool call asking for the current date, as expected. But I don’t actually get the generated text back, just the tool call.
So then I can execute the tool, and add the result to the context as a “tool” message. But the chain-of-thought will no longer be in the context, so the model will re-do it again, before finally asking for the weather in Osaka, to only drop it again.
Am I doing something wrong here? Especially for more complex workflows that use tool-calls in-between text generation, dropping all generated text after every tool call, and then having to re-generate it again, is very inefficient.