I want to maintain a series of rules and instructions between LLM sessions (about 150 lines).
Each interaction with the LLM will be disconnected from the previous one and spaced out in time, so keeping the context indefinitely will be inefficient and costly.
Sending the 150 lines of instructions in every request (new session) will be equally inefficient and costly, with the added problem of huge latency while it processes the instructions.
Is there any solution to this? Or what would be the best approach?
For example, maintaining the same session, ordering it to ignore previous responses, monitoring the used context, and resetting the session before it reaches the limit?
Hi Cornete, use Prompt Caching. Put your ~150-line policy as a static system message at the very start of every request. On cache hits, this reduces latency and provides an input token discount on the cached portion. Do not put any variable data before the rules (for example timestamps or user IDs), since any changing prefix can break cache hits. Keeping a long session and telling the model to “ignore” earlier turns will not reduce token or context usage.
Telling the model to “ignore” earlier messages it’s because iI don’t want LLM to contaminate or influence with old messages. That is why I would prefer to start a new session for each message.
What it’s the point to keep the policy message on every request? If I keep the session it will remember the first message anyway, Isn’t that so?
Now I wonder what it’s the best approach in terms of speed and token wasted in long term:
a) Create a new session in every question, sending policies every time.
b) Keep the session, but waste time and tokens in olders messages context that are useless
I think there’s a small mismatch here, correct me if I’m misunderstanding.
In the API, the model doesn’t remember anything between requests. It only “remembers the first message” if you keep sending it again in messages. So if you want zero influence from old turns, don’t include old turns. If you want the policy enforced, you do need to include it each request, and Prompt Caching helps when that policy prefix stays identical
TL;DR: The cheapest, fastest way to keep a 150‑line policy across independent LLM calls is to store the policy as a static system message and prepend it to every request; Groq’s prompt‑caching will cache the exact policy text after the first hit, so subsequent calls incur zero token cost and near‑zero latency for that portion. Because the model has no memory between calls, you must resend the policy each time, but as long as the text never changes (no timestamps, IDs, or extra whitespace), the cache stays valid. This beats keeping a long session (which adds token waste and latency) and avoids any “contamination” from prior turns.