Best approach to maintain rules between sessions

Cornete · January 26, 2026, 4:51pm

I want to maintain a series of rules and instructions between LLM sessions (about 150 lines).

Each interaction with the LLM will be disconnected from the previous one and spaced out in time, so keeping the context indefinitely will be inefficient and costly.

Sending the 150 lines of instructions in every request (new session) will be equally inefficient and costly, with the added problem of huge latency while it processes the instructions.

Is there any solution to this? Or what would be the best approach?

For example, maintaining the same session, ordering it to ignore previous responses, monitoring the used context, and resetting the session before it reaches the limit?

meshari · January 27, 2026, 7:30am

Hi Cornete, use Prompt Caching. Put your ~150-line policy as a static system message at the very start of every request. On cache hits, this reduces latency and provides an input token discount on the cached portion. Do not put any variable data before the rules (for example timestamps or user IDs), since any changing prefix can break cache hits. Keeping a long session and telling the model to “ignore” earlier turns will not reduce token or context usage.

More on docs: https://console.groq.com/docs/prompt-caching

Cornete · January 27, 2026, 5:53pm

Thanks for the help.

Telling the model to “ignore” earlier messages it’s because iI don’t want LLM to contaminate or influence with old messages. That is why I would prefer to start a new session for each message.

What it’s the point to keep the policy message on every request? If I keep the session it will remember the first message anyway, Isn’t that so?

Now I wonder what it’s the best approach in terms of speed and token wasted in long term:

a) Create a new session in every question, sending policies every time.
b) Keep the session, but waste time and tokens in olders messages context that are useless

meshari · January 28, 2026, 5:30pm

I think there’s a small mismatch here, correct me if I’m misunderstanding.

In the API, the model doesn’t remember anything between requests. It only “remembers the first message” if you keep sending it again in messages. So if you want zero influence from old turns, don’t include old turns. If you want the policy enforced, you do need to include it each request, and Prompt Caching helps when that policy prefix stays identical

Cornete · January 29, 2026, 10:55am

I hadn’t realised that, so it’s just what I need along with Prompt Caching

Thank you very much

meshari · February 2, 2026, 4:14pm

Anytime, happy to help!

Addison · February 26, 2026, 7:24pm

TL;DR: The cheapest, fastest way to keep a 150‑line policy across independent LLM calls is to store the policy as a static system message and prepend it to every request; Groq’s prompt‑caching will cache the exact policy text after the first hit, so subsequent calls incur zero token cost and near‑zero latency for that portion. Because the model has no memory between calls, you must resend the policy each time, but as long as the text never changes (no timestamps, IDs, or extra whitespace), the cache stays valid. This beats keeping a long session (which adds token waste and latency) and avoids any “contamination” from prior turns.

Topic		Replies	Views
Maintaining conversation context Forum	3	125	September 23, 2025
Feature Request: Context Caching Support for Sustainable Use Cases Forum	4	96	July 22, 2025
tricks to avoid hitting groq rate limits Build with Groq	0	21	February 26, 2026
Prompt Caching Feature Requests	5	130	October 27, 2025
Combine text generation and tool use Forum	3	62	July 15, 2025

Best approach to maintain rules between sessions

Related topics