API rate limiting is a technique that restricts how many requests a client can send to an API within a specific time period. It protects servers from being overwhelmed and ensures fair access across all users.
When you call an API, the server tracks your request count, usually by API key or IP address. If you exceed the allowed limit (for example, 100 requests per minute), the server responds with a 429 Too Many Requests status code and blocks further requests until the window resets.
Most APIs communicate rate limit details through response headers. Common headers include X-RateLimit-Limit (your total allowed requests), X-RateLimit-Remaining (how many you have left), and X-RateLimit-Reset (when the counter resets). Well-behaved clients read these headers and throttle themselves accordingly.
Rate limiting can be implemented with several algorithms. Token bucket allows bursts up to a limit then throttles. Sliding window smooths traffic more evenly. Fixed window is the simplest, resetting the counter at regular intervals. The right approach depends on the API's use case and the traffic patterns it expects.
If you are building an application that relies on external APIs, hitting rate limits will break your user experience. Understanding rate limits helps you design caching strategies, implement retry logic with exponential backoff, and choose the right polling intervals. For API providers, rate limiting is essential to maintain performance and prevent abuse.
The TerminalFeed API Worker uses in-memory caching with per-endpoint TTLs to stay within upstream rate limits. Our API Rate Limits Explained article covers practical strategies for handling limits in your own projects. You can explore our endpoints at /developers.