---
title: Rate limits
description: Per-token sliding-window budgets, headers Niyra returns, and how to back off cleanly.
url: /docs/api-rate-limits
lastUpdated: 2026-06-11
---

# Rate limits


# Rate limits

Niyra rate-limits per token using a sliding-window counter. Each token (OAuth access token or PAT) has its own budget — multiple tokens for the same user don't share a window.

## Default limits

| Endpoint family | Limit | Window |
| --------------- | ----- | ------ |
| `niyra_ask` | 60 requests | 1 minute |
| `niyra_execute` | 20 requests | 1 minute |
| `niyra_memories` / `niyra_remember` | 120 requests | 1 minute |
| `niyra_get_task` polling | 600 requests | 1 minute |

Alpha-plan users get 5× these limits. Pro users get 3×. Standard users get 1×.

## Response headers

On every response, Niyra returns:

```http
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1717100123
```

`Reset` is a Unix timestamp — when the current window rolls over.

On a 429:

```http
HTTP/1.1 429 Too Many Requests
Retry-After: 12
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "error_description": "you've hit the per-token rate limit"
}
```

`Retry-After` is in seconds. Honor it — Niyra tracks repeated immediate retries as abuse signal.

## Backoff pattern

```python
import time, random, requests

def call_with_backoff(url, headers, json, max_tries=5):
    for attempt in range(max_tries):
        r = requests.post(url, headers=headers, json=json)
        if r.status_code != 429:
            return r
        wait = int(r.headers.get("Retry-After", "5"))
        # Add jitter so a fleet of workers doesn't synchronize.
        time.sleep(wait + random.uniform(0, 1))
    raise RuntimeError("rate limit retries exhausted")
```

## Polling etiquette

For `niyra_get_task`:

- Minimum interval: 3 seconds. Anything faster will 429 you out before it speeds the result.
- Backoff: if the task has been running for 60+ seconds, drop to 10s polls. Most long-running tasks take 1–5 minutes.
- Cap: poll for at most 10 minutes. Beyond that, surface the task ID to the user so they can check the dashboard.

## Burst behavior

The sliding window is not a token bucket — there's no burst credit. Sending 60 requests in the first second of a minute will exhaust your budget for the rest of the window. Spread requests across the window.

## Related

- [niyra_ask](/docs/api-tool-niyra-ask)
- [niyra_execute](/docs/api-tool-niyra-execute)
- [niyra_get_task](/docs/api-tool-niyra-get-task)