Hi Groq team ![]()
I’m exploring whether fine-tuning (or lightweight adaptation) is supported for models that are intended to run on Groq LPU.
My primary use case is high-throughput, low-latency inference, and I’d like to understand:
Whether Groq currently supports full fine-tuning, PEFT methods (LoRA / adapters), or any training workflows that can directly leverage the LPU
If fine-tuning must be done off-platform (GPU) and then compiled / deployed for Groq inference, what constraints or best practices should be followed
How model architecture, quantization, or weight formats impact deployability on Groq after fine-tuning
Any known limitations or roadmap around training or adaptation support on Groq hardware
The goal is to adapt a model’s behavior (domain/language/style) while still fully benefiting from Groq’s deterministic performance and throughput at inference time.
Would appreciate guidance, documentation pointers, or examples from the community or Groq team.
Thanks!