March 21, 2026|10 min read

Understanding API Rate Limits: A Developer's Guide

Every API has limits. Understanding how rate limiting works -- and how to handle it gracefully in your code -- is the difference between a reliable integration and one that breaks under load. This guide covers everything developers need to know.

What Is Rate Limiting?

Rate limiting is a mechanism that controls the number of requests a client can make to an API within a given time window. It protects the server from abuse, ensures fair usage across all clients, and maintains service quality.

When you exceed the limit, the API returns a 429 Too Many Requests status code. The response typically includes headers that tell you when you can retry.

Common Rate Limiting Strategies

1. Fixed Window

The simplest approach. You get N requests per time window (e.g., 100 requests per minute). The counter resets at the start of each window. Simple but can allow burst traffic at window boundaries.

2. Sliding Window

A smoother approach that tracks requests over a rolling time period. Prevents the burst problem of fixed windows. More complex to implement but provides better traffic shaping.

3. Token Bucket

Tokens are added to a bucket at a fixed rate. Each request consumes a token. If the bucket is empty, the request is rejected. Allows controlled bursting up to the bucket size while maintaining a steady average rate.

4. Leaky Bucket

Requests are processed at a fixed rate regardless of how fast they arrive. Excess requests queue up (or get rejected if the queue is full). Produces the smoothest output rate.

Rate Limit Headers

Most well-designed APIs include rate limit information in response headers. Here is what ScreenshotAPI returns:

X-RateLimit-Limit: 30

X-RateLimit-Remaining: 27

X-RateLimit-Reset: 1711036800

# Limit: max requests per window

# Remaining: how many you have left

# Reset: Unix timestamp when the window resets

Handling Rate Limits in Code

Node.js Example with Exponential Backoff

async function screenshotWithRetry(url, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(
      `https://screenshotapi-api-production.up.railway.app/v1/screenshot?url=${encodeURIComponent(url)}`,
      { headers: { Authorization: 'Bearer YOUR_API_KEY' } }
    );

    if (response.status === 429) {
      const resetAt = response.headers.get('X-RateLimit-Reset');
      const waitMs = resetAt
        ? (parseInt(resetAt) * 1000) - Date.now()
        : Math.pow(2, attempt) * 1000;

      console.log(`Rate limited. Waiting ${waitMs}ms...`);
      await new Promise(r => setTimeout(r, Math.max(waitMs, 1000)));
      continue;
    }

    if (!response.ok) throw new Error(`HTTP ${response.status}`);
    return response.buffer();
  }
  throw new Error('Max retries exceeded');
}

Python Example

import requests
import time

def screenshot_with_retry(url, max_retries=3):
    for attempt in range(max_retries + 1):
        response = requests.get(
            "https://screenshotapi-api-production.up.railway.app/v1/screenshot",
            params={"url": url},
            headers={"Authorization": "Bearer YOUR_API_KEY"},
        )

        if response.status_code == 429:
            reset_at = response.headers.get("X-RateLimit-Reset")
            if reset_at:
                wait = max(int(reset_at) - time.time(), 1)
            else:
                wait = 2 ** attempt
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
            continue

        response.raise_for_status()
        return response.content

    raise Exception("Max retries exceeded")

Best Practices for Production

Always Check Rate Limit Headers

Read X-RateLimit-Remaining before making the next request. If you are running low, slow down proactively instead of waiting for a 429.

Use Exponential Backoff

When retrying after a 429, wait exponentially longer each time: 1s, 2s, 4s, 8s. Add random jitter to avoid thundering herd problems.

Queue Your Requests

For batch operations, use a job queue (Bull, Celery, SQS) that respects rate limits and processes screenshots sequentially or with controlled concurrency.

Use Async/Webhook for Large Batches

Instead of synchronous requests, use ScreenshotAPI's async endpoint. Submit screenshots and receive results via webhook -- no polling, no rate limit pressure.

Cache Results

If you screenshot the same URLs repeatedly, cache the results. A simple Redis or file-system cache can reduce your API calls by 80% or more.

Monitor Your Usage

Use the GET /v1/usage endpoint to track your monthly consumption. Set up alerts before you hit your plan limit so you can upgrade or optimize.

ScreenshotAPI Rate Limits by Plan

Plan	Monthly Limit	Rate Limit	Burst
Free	100/month	10/min	5 concurrent
Pro	10,000/month	30/min	10 concurrent
Business	100,000/month	120/min	30 concurrent

Common Mistakes to Avoid

xIgnoring 429 responses -- Retrying immediately without backoff just makes the problem worse and can get your key temporarily banned.
xFire-and-forget requests -- Always check the response status. A screenshot that returned 429 did not actually get captured.
xNot reading headers -- The rate limit headers tell you exactly when to retry. Use them instead of guessing.
xParallelizing without limits -- Spawning 100 concurrent requests will immediately hit the rate limit. Use a concurrency limiter like p-limit.

Ready to Start Capturing Screenshots?

Get 100 free screenshots per month. Generous rate limits even on the free plan.

Try the Playground

Automate Screenshots with Node.js

Complete guide to programmatic screenshots

Screenshot Testing in CI/CD

Visual regression testing with screenshots