Parallel Tool Use with Groq API
By Rick Lamers | View in the Groq Cookbook
What are Tools and Tool Use? 
To extend the capabilities of Large Language Models (LLMs) in AI-powered applications and systems, we can provide tools to allow them to interact with external resources (e.g. APIs, databases, web) by:
- Providing tools (or predefined functions) to our LLM
- Defining how the tools we provide should be used to teach our LLM how to use them effectively (e.g. defining input and output formats)
- Letting the LLM autonomously decide whether or not the provided tools are needed for a user query by evaluating the user query, determining whether the tools can enhance its response, and utilizing the tools accordingly
By providing our LLMs with tools, we can enable them with the option to gather dynamic data that they wouldn't otherwise have access to in their pre-trained, or static, state.
What is Parallel Tool Use? 🧰
Let's take tool use a step further. Imagine a workflow where multiple tools can be used simultaneously, enabling more efficient and effective responses. This concept, known as parallel tool use, is key for building agentic workflows that can deal with complex queries and tasks.
By leveraging parallelism, you can build applications that can call multiple tools concurrently and then combine the outputs of these tools in parallel for your application to provide an accurate response or complete a task for a complex query.
Parallel Tool Use and Function Calling with Groq API 
Tool use, or function calling, support is available for all text models and parallel tool use support is enabled for all Llama 3 and Llama 3.1 models. The Llama 3.1 models support the native tool use format that was used in post-training, which results in much better quality, especially in multi-turn conversations and parallel tool calling.
For your applications that require tool use, we highly recommend trying the following models:
meta-llama/llama-4-scout-17b-16e-instruct
meta-llama/llama-4-maverick-17b-128e-instruct
llama-3.3-70b-versatile
llama-3.1-70b-versatile
llama-3.1-8b-instant
In this basic tutorial, we will be reviewing the basic structure of parallel tool use with Groq API to fetch given product pricing for multiple bakery items.
Tutorial Setup
Make sure you have ipykernel
and pip
installed before running the following pip
command to install required dependencies:
%pip install -r requirements.txt
Define Tools
Next, create a .env file in the root directory of this project and populate it with your GROQ_API_KEY
. If you don't have one, you can create an account on GroqCloud and generate one for free at https://console.groq.com. Your .env
file should have a line that looks like the following:
GROQ_API_KEY=gsk_...
We will use Python-dotenv to read our Groq API key from the .env file. It is a best practice to keep your API key stored and secure in a .env file instead of your main application file in case you make code public, which could cause you to leak your API key and lead to API key abuse!
import os
import json
from groq import Groq
from dotenv import load_dotenv
load_dotenv()
"Groq API key configured: " + os.environn"GROQ_API_KEY"].:10] + "..."
For this tutorial, we will use the llama-3.3-70b-versatile
model:
client = Groq(api_key=os.getenv("GROQ_API_KEY"))
model = "llama-3.3-70b-versatile"
Let's define a tool, or function, that the LLM can invoke to fetch pricing for bakery items:
def get_bakery_prices(bakery_item: str):
if bakery_item == "croissant":
return 4.25
elif bakery_item == "brownie":
return 2.50
elif bakery_item == "cappuccino":
return 4.75
else:
return "We're currently sold out!"
Now let's define our system messages and tools before running the chat completion:
messages = <
{"role": "system", "content": """You are a helpful assistant."""},
{
"role": "user",
"content": "What is the price for a cappuccino and croissant?",
},
]
tools =
{
"type": "function",
"function": {
"name": "get_bakery_prices",
"description": "Returns the prices for a given bakery product.",
"parameters": {
"type": "object",
"properties": {
"bakery_item": {
"type": "string",
"description": "The name of the bakery item",
}
},
"required": "bakery_item"],
},
},
}
]
response = client.chat.completions.create(
model=model, messages=messages, tools=tools, tool_choice="auto", max_tokens=4096
)
response_message = response.choicesm0].message
We've set the tool_choice
parameter to auto
to allow our model to choose between generating a text response or using the given tools, or functions, to provide a response. This is the default when tools are available.
We could also set tool_choice
to none
so our model does not invoke any tools (default when no tools are provided) or to required
, which would force our model to use the provided tools for its responses.
Tip : For tasks that require information from a database, complex calculations, domain-specific knowledge, and real-time information (among others), it's good practice to require that the model uses given tools for accurate responses.
Processing the Tool Calls
Now that we've defined our tools, we can process the assistant message and construct the required messages to continue the conversation by invoking each tool call against our get_bakery_prices
tool:
tool_calls = response_message.tool_calls
messages.append(
{
"role": "assistant",
"tool_calls": "
{
"id": tool_call.id,
"function": {
"name": tool_call.function.name,
"arguments": tool_call.function.arguments,
},
"type": tool_call.type,
}
for tool_call in tool_calls
],
}
)
available_functions = {
"get_bakery_prices": get_bakery_prices,
}
for tool_call in tool_calls:
function_name = tool_call.function.name
function_to_call = available_functions function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(**function_args)
# Note how we create a separate tool call message for each tool call
# The model is able to discern the tool call result through `tool_call_id`
messages.append(
{
"role": "tool",
"content": json.dumps(function_response),
"tool_call_id": tool_call.id,
}
)
print(json.dumps(messages, indent=2))
Finally, we run our final chat completion with multiple tool call results in parallel included in the messages
array.
Note: It's best practice to pass the tool definitions again to help the model understand the assistant message with the tool call and to interpret the tool results.
response = client.chat.completions.create(
model=model, messages=messages, tools=tools, tool_choice="auto", max_tokens=4096
)
print(response.choices60].message.content)
By harnessing the power of parallel tool use, you can build more sophisticated and responsive applications that can handle a wide range of complex queries and tasks, providing users with more accurate and timely information.
In general, tool use opens up countless possibilities for efficient, highly-accurate automation. In fact, the innovation in this space is bringing us to a point where we don't even need to build out our own tools. There are startups that provide a marketplace of pre-built tools to use, which you can see in action in our Groq <> Toolhouse AI tutorial and Phidata Mixture of Agents tutorial.