Filtering markdown rulling files by prompt according to prompt

hi friends i have about 180k judgments rullings i want some one to make phyton script for me which will search all folders in main source folder in F:\cho\data\ocr\done\markdown-1970-2025 and then give output files to F:\cho\data\ocr\done\markdown-1970-2025\output these are all legal court judgments and my prompt is folowing i try some codes but when i run my api start giving errors after few files search i dont known why my api gives error so fast F:\cho\data\ocr\grok>python filterofrullings.py
:magnifying_glass_tilted_left: Checking A. Qutubuddin Khan (d_b_a “QMR Expert Consultants”) and others vs CHEC-Millwala Dredging Co. (Pvt.) Ltd_.md
:cross_mark: Not favorable → skipped: F:\cho\data\ocr\done\markdown-1970-2025\2025\A. Qutubuddin Khan (d_b_a “QMR Expert Consultants”) and others vs CHEC-Millwala Dredging Co. (Pvt.) Ltd_.md
:magnifying_glass_tilted_left: Checking A.M. Construction Company (Private) Limited vs Taisei Corporation, etc.md
:prohibited: Key gsk_pd0oVWDH hit 429. Removing permanently.
:cross_mark: Not favorable → skipped: F:\cho\data\ocr\done\markdown-1970-2025\2025\A.M. Construction Company (Private) Limited vs Taisei Corporation, etc.md

=== Prompt for Filtering ===

BASE_PROMPT = “”"
You are an expert in Pakistani case law.
Evaluate the ruling below ONLY to decide if it clearly supports the case of Pir Salman Zahid
against Khair Muhammad in a property mutation dispute.

Mark “YES” only if the ruling supports one or more of these legal grounds in favor of Pir Salman Zahid:

  1. Limitation Act / Time-Bar

    • The appeal, revision, or administrative review was dismissed as time-barred.
    • Delay was not condoned because the party had no sufficient cause.
    • Court held that limitation is mandatory and jurisdiction cannot be exercised after limitation expires.
  2. Mutation & Power of Attorney Fraud

    • Mutation entries alone do not confer ownership when based on fraudulent, forged, or unpaid transactions.
    • Fraud or misuse of Power of Attorney vitiates the transaction and the mutation can be cancelled.

Decision:

  • Reply ONLY with “YES” if the ruling strongly supports any of the above in Pir Salman Zahid’s favour.
  • Reply ONLY with “NO” if it does not.
    “”"

You’re getting 429 errors because your code is sending too many requests all at once, so our server will rate limit your requests.

I’m not really a python dev but I looked this up; you could use:
pip install aiolimiter tenacity anyio rx

and then you can do something like

import asyncio
import aiohttp
from aiolimiter import AsyncLimiter
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type

class TransientHttpError(Exception):
  pass

class SafeApiClient:
  def __init__(
    self,
    base_url: str,
    *,
    max_concurrency: int = 8,
    rps: int = 4,
    timeout_s: float = 30.0,
    default_headers: dict | None = None
  ):
    self.base_url = base_url.rstrip("/")
    self.sem = asyncio.Semaphore(max_concurrency)
    self.limiter = AsyncLimiter(max_rate=rps, time_period=1)
    self.timeout = aiohttp.ClientTimeout(total=timeout_s)
    self.default_headers = default_headers or {}
    self._session: aiohttp.ClientSession | None = None

  async def __aenter__(self):
    self._session = aiohttp.ClientSession(timeout=self.timeout)
    return self

  async def __aexit__(self, *exc):
    if self._session:
      await self._session.close()

  async def request(self, method: str, path: str, **kwargs):
    """
    method: "GET"/"POST"/...
    path: "/v1/things" or "v1/things"
    kwargs: passed to aiohttp (json=, params=, data=, headers=, etc.)
    """
    if not self._session:
      raise RuntimeError("Use 'async with SafeApiClient(...) as client:'")

    url = f"{self.base_url}/{path.lstrip('/')}"
    headers = {**self.default_headers, **kwargs.pop("headers", {})}

    @retry(
      retry=retry_if_exception_type(TransientHttpError),
      wait=wait_random_exponential(multiplier=0.2, max=10.0),
      stop=stop_after_attempt(6),
      reraise=True,
    )
    async def _do():
      async with self.limiter:
        async with self.sem:
          async with self._session.request(method, url, headers=headers, **kwargs) as resp:
            # Respect Retry-After on 429/503
            if resp.status in (429, 503):
              ra = resp.headers.get("Retry-After")
              if ra:
                try:
                  delay = float(ra)
                except ValueError:
                  delay = 0
                if delay > 0:
                  await asyncio.sleep(delay)
              raise TransientHttpError(f"{resp.status} {await _safe_snippet(resp)}")

            # Retry on typical transient 5xx
            if 500 <= resp.status < 600:
              raise TransientHttpError(f"{resp.status} {await _safe_snippet(resp)}")

            # Raise for other >= 400
            if 400 <= resp.status:
              text = await resp.text()
              raise aiohttp.ClientResponseError(
                request_info=resp.request_info,
                history=resp.history,
                status=resp.status,
                message=text,
                headers=resp.headers,
              )

            # Auto-decode JSON when possible
            ctype = resp.headers.get("Content-Type", "")
            if "application/json" in ctype:
              return await resp.json()
            return await resp.text()

    return await _do()

async def _safe_snippet(resp: aiohttp.ClientResponse, max_len: int = 200) -> str:
  try:
    text = await resp.text()
    return text[:max_len]
  except Exception:
    return "<no-body>"

# ---- example usage ----
# async def main():
#   async with SafeApiClient("https://api.example.com", default_headers={"Authorization": "Bearer TOKEN"}) as client:
#     # up to 8 in-flight, 4 requests/sec total
#     tasks = [
#       asyncio.create_task(client.request("GET", f"/v1/items/{i}"))
#       for i in range(20)
#     ]
#     results = await asyncio.gather(*tasks, return_exceptions=True)
#     print(results)
#
# asyncio.run(main())

again, I’m not really a python dev, but this (from ChatGPT) is the python equivalent of what I usually do in JS — adding async semaphores and debouncer; this rune 8 concurrent calls with 4 requests per second, with retries.

Try it out!