How API Rate Limits Actually Work (And How to Work Around Them)

The first time you hit a rate limit, it feels like a betrayal. Your code was working perfectly. You ran it again to test something. Suddenly every request returns a 429 error and your application is broken. You wait a few minutes, run it again, and everything works. What just happened?

Welcome to rate limits. Every public API has them, and understanding how they work is one of the most important skills for any developer who builds on third-party services.

What a rate limit actually is

A rate limit is a rule the API provider enforces to protect their infrastructure from being overwhelmed by any single client. The simplest version is "you can make X requests per Y time period." For example, the GitHub API allows authenticated users 5,000 requests per hour. The Twitter API has separate limits for different endpoints, ranging from 15 requests per 15 minutes to 300 requests per 15 minutes. The CoinGecko free tier allows around 30 calls per minute.

When you exceed the limit, the API stops accepting your requests until the time window resets. The standard response is HTTP status 429 Too Many Requests. Most APIs include headers in their responses telling you how many requests you have remaining and when the limit resets. Reading these headers is the difference between hitting walls blindly and gracefully managing your usage.

Why rate limits exist

Rate limits exist for three reasons, and understanding them helps you respect them.

First, infrastructure cost. Every API call consumes server resources: CPU, memory, database queries, network bandwidth. A single bad actor sending millions of requests can cost the provider real money and degrade service for everyone else. Rate limits cap each user's resource consumption to a sustainable level.

Second, abuse prevention. Without rate limits, scraping becomes trivial. Anyone could clone an entire dataset in minutes, bypassing whatever business model the provider built around their data. Rate limits make scraping possible but slow, which preserves the value of the underlying service.

Third, fairness. If one user could send 10 million requests per hour, they would crowd out smaller users who only need to send a few hundred. Rate limits ensure that everyone gets a fair share of available capacity.

These reasons aren't arbitrary. They're the same reasons any shared resource has rules. Rate limits aren't a punishment. They're how the API stays available for everyone.

The patterns that let you build reliably

Once you understand why rate limits exist, the patterns for working with them become obvious.

Caching. The most important pattern. If your application needs Bitcoin price data, you don't need to call the price API every time a user loads your page. Call it once every 30 seconds, store the result, and serve the cached value to all users. Suddenly an API that allows 30 requests per minute can serve thousands of users per minute. This is exactly how TerminalFeed works under the hood. Every external API call is cached in a Cloudflare Worker, and visitors get the cached response instantly without ever touching the upstream API.

Exponential backoff. When you hit a 429 error, don't immediately retry. Wait, then retry. If you hit it again, wait longer. The standard formula is to double your wait time after each failure (1 second, 2 seconds, 4 seconds, 8 seconds) up to some maximum. This gives the API time to recover and prevents you from hammering it during an outage.

Request batching. Many APIs let you fetch multiple items in a single request. Instead of calling the user lookup endpoint 100 times for 100 users, call the bulk lookup endpoint once with 100 IDs. One request instead of 100. Most rate limits count requests, not items returned, so batching is essentially free quota.

Queue management. If you have a burst of work to do (say, syncing 10,000 records) don't try to do it all at once. Put the work in a queue and process it at a sustainable rate. Most messaging libraries (BullMQ, SQS, etc.) have built-in rate limiting that respects your API's limits automatically.

Real-world example: The TerminalFeed API Worker handles 12,000+ requests per day from visitors, but only makes a fraction of that in upstream API calls. Every endpoint has a cache TTL (30 seconds for crypto prices, 5 minutes for news, 15 minutes for economic data). Stale cache is returned on failure, so visitors never see an error even when an upstream API is down or rate-limited.

The headers that tell you everything

Most modern APIs include rate limit information in their response headers. The convention is:

X-RateLimit-Limit: how many requests you can make in the current window
X-RateLimit-Remaining: how many you have left
X-RateLimit-Reset: when the limit resets, usually as a Unix timestamp

If you're not reading these headers, you're flying blind. The remaining count tells you exactly how close you are to hitting the wall. The reset timestamp tells you exactly when you can start making requests again. Build your code around these values and you'll never be surprised.

Some APIs use slightly different header names. GitHub uses X-RateLimit-Limit. Twitter uses x-rate-limit-limit (lowercase). Discord uses X-RateLimit-Bucket and includes additional metadata about which endpoint the limit applies to. Read the documentation for whatever API you're using and respect what they tell you.

When rate limits become a real problem

Sometimes a free tier really isn't enough. You're building a product, your traffic is growing, and the rate limits become a bottleneck. At this point you have three options.

Option one: pay for a higher tier. Most APIs have paid plans with significantly higher limits. If your business model depends on the API, paying for it is usually the right move.

Option two: aggressive caching. Cache as much as possible, refresh less frequently, and serve stale data when fresh data isn't available. The TerminalFeed Worker uses this pattern extensively. Every endpoint has a cache TTL and falls back to stale data if the upstream API fails.

Option three: switch providers or self-host. Some data is available from multiple sources. Bitcoin price is published by Binance, Coinbase, Kraken, CoinGecko, CoinCap, and dozens of other exchanges and aggregators. If one provider's limits are too restrictive, try another. For some data types (RSS feeds, public datasets, blockchain data), self-hosting is feasible and removes the rate limit entirely.

See how TerminalFeed handles 20+ rate-limited APIs with smart caching. Full API docs and code examples.

View the TerminalFeed API

Rate limits aren't a problem to fight. They're a constraint to design around. The developers who build the most reliable systems on top of public APIs are the ones who internalize the limits, respect them, and architect their code to work within them. Once you stop treating rate limits as an obstacle and start treating them as a design parameter, they become invisible.

What a rate limit actually is

Why rate limits exist

The patterns that let you build reliably

The headers that tell you everything

When rate limits become a real problem

RELATED ARTICLES

RELATED TOOLS