Please see the code samples that accompany this article on GitHub.

What is rate-limiting?

Web services, especially large, public ones, often use a rate-limiting strategy to protect themselves from attacks or to ensure fair use of their resources. This means that you can only make a certain number of requests in a given time frame.

Typically, the service will return a 429 Too Many Requests response when you exceed the limit. Some services also return information in the response headers to indicate how many requests you have left, and when you can make the next request.

Rate-limiting comes in a few different flavors. Some services use a simple fixed-window limit (e.g. 100 requests per minute), while others use a more elaborate sliding-window or token-bucket strategy. For an overview of various rate-limiting strategies, see this article.

How to deal with rate-limiting?

The best way to handle rate-limiting depends on the specifics of the service you are consuming. But it also depends on the type of application that you are building. That’s why it is very hard to recommend a one-size-fits-all solution. Instead, let’s look at a few strategies that you can use. This will help you decide which one is best for your situation.

Strategy: do nothing

If you’re reading this article, this is probably not the strategy that you’re looking for. But maybe you can get away with not handling the rate limit at all. Perhaps the number of requests that your application will send out is so low that you will never hit the limit. Or maybe the call isn’t critical, and you can just discard the occasional 429 response. In that case, doing nothing might be your best option: it doesn’t require any extra code, doesn’t add complexity, and works just fine.

Properties

  • Simplest strategy, no extra code or complexity.
  • Not robust, you’re not actually handling rate limit responses.
  • Only applicable in very specific scenarios.

Strategy: standard resilience handling

If you don’t specifically expect heavy traffic or bursts of calls to the service, but you’re worried that you might occasionally run into rate limits, then you can consider using the standard resilience features of .NET.

In .NET 8, Microsoft introduced the AddStandardResilienceHandler() extension method for HttpClient, as part of the Microsoft.Extensions.Http.Resilience package. This package is based on Polly, the well-known library for building resilient application. It adds a series of standard ‘best-practice’ resilience policies to your HttpClient, such as automatic retries and timeouts. You can read all about it on Microsoft’s documentation page.

To use it, simply install the Microsoft.Extensions.Http.Resilience package:

dotnet add package Microsoft.Extensions.Http.Resilience

And then add the resilience handler, where you configure your HttpClient:

builder.Services
    .AddHttpClient("MyNamedClient")
    .AddStandardResilienceHandler();

By default, this handler will make sure that any transient errors, including rate-limiting responses, are retried 3 times using an exponential backoff strategy with jitter.

This approach is a very powerful ‘general purpose’ strategy. It gives you a set of best-practice resilience policies out of the box, not just for rate limiting. The downside is that you have only limited control over the policies. You can tweak the settings somewhat, but you can’t fully customize the resilience pipeline.

It’s called a resilience pipeline because a number of resilience strategies are applied in sequence, such as a timeout policy, a retry policy, and a circuit breaker. For advanced scenarios, you will probably want to build your own resilience pipeline by chaining the policies that you need in the exact order that you need them.

Properties

  • Very easy to implement.
  • You get a set of best-practice resilience policies out of the box, not just for rate limiting.
  • Limited control over the built-in policies: you can customize them somewhat, but only to a limited extent.

Strategy: proactive rate-limiting of outgoing calls

If you’re building an application where you know you are going to hit the rate limit of the external service, you might want to take a more pro-active approach. In this case, you can impose a rate-limit on your own outgoing calls to the external service. If you match your outgoing rate to the rate limit of the service, you should avoid hitting the limit altogether.

There’s a lot of ways you can implement this.

Build an outgoing rate-limiter policy

One way could be to use a similar resilience pipeline that we’ve seen before, but with a policy that rate-limits outgoing calls. You could configure such a pipeline like this, on the HttpClient that makes the calls to the external service:

builder.Services
    .AddHttpClient("default", client => 
    { 
        client.BaseAddress = new Uri("https://localhost:7298"); 
    })
    .AddResilienceHandler("default", config =>
    {
        config.AddRetry(new RetryStrategyOptions<HttpResponseMessage>
        {
            ShouldHandle = async args => await ValueTask.FromResult(
                args.Outcome.Exception is RateLimiterRejectedException
            ),
            Delay = TimeSpan.FromSeconds(2),
            MaxRetryAttempts = 5,
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true
        });
        config.AddRateLimiter(new FixedWindowRateLimiter(
            new FixedWindowRateLimiterOptions
            {
                Window = TimeSpan.FromSeconds(1), 
                PermitLimit = 1
            }
        ));
    });

You can try this out yourself with the code in the accompanying repository. You’ll find that this solution does require some tweaking. If your application keeps sending out requests at a rate that is much higher than the rate limit allows, this pipeline will eventually reach the maximum number of retries and start throwing exceptions. You won’t overload the external service in this case, but you still need to handle the errors. To prevent this, you need to carefully tweak the settings on the retry and rate limiter strategies.

Properties

  • Easy to build in using the extension methods from the Microsoft.Extensions.Http.Resilience package.
  • You have a lot of control over the policies that you add to the pipeline.
  • Requires some tweaking to get right. If calls eventually fail, you still need to handle the errors.

Build in simple delays in your application logic

If you have a little more control over your application code, you might go for an even simpler rate-limit strategy. Just build in a naive delay between outgoing calls. Something like this:

for (var i = 0; i < 10; i++)
{
    var response = await http.PostAsJsonAsync("/echo", $"Call {i + 1}");
    
    // Effectively rate-limits the outgoing calls to 1 per second
    await Task.Delay(1000);
}

A slightly more elaborate version of this might use a service that keeps track of the timestamps of all outgoing calls, over a certain period (your rate limit window). Before making an outgoing call, the service will determine based on the recent call history whether an outgoing call is allowed. If not, you delay making the call.

public class RateLimiter
{
    private readonly List<DateTime> _calls = [];
   
    public void AddCall() => _calls.Add(DateTime.Now);

    public bool IsAllowed()
    {
        // Rate limit to 1 call per second, and allow for a small margin of error.
        var count = _calls.Count(x => x > DateTime.Now.AddSeconds(-1.005d));
        return count < 1;
    }
}
for (var i = 0; i < 10; i++)
{
    while (!rateLimiter.IsAllowed())
    {
        await Task.Delay(1000);
    }

    var response = await http.PostAsJsonAsync("/echo", $"Call {i + 1}");
    rateLimiter.AddCall();
}

Instead of waiting for a full second, you can try and come up with a smarter delay strategy, based on the specifics of the rate limit of the external service.

Properties

  • No dependencies on external libraries such as Polly, or having to learn the concepts behind them.
  • You are not limited to settings on the HttpClient, you are free to implement any strategy that suits your needs.
  • You must build everything yourself; depending on the use case, this might be a pro or a con.

Use a message queue

If the outgoing calls are mission critical and you cannot accept any missed or out-of-order calls, a message queue might offer a more robust solution. Your application will prepare a task and put it on the queue. A separate process picks up the task from the queue and sends it to the external service. The task in only removed from the queue once the call to the external service was successful.

Rate-limiting using a message queue

Most message queue systems support a retry mechanism. If you hit the rate limit of the external service, the task will simply remain in the queue and be retried, according to the policy you configured. This way, you are effectively building a rate-limiter into your application, with the added benefit that tasks will never be lost or processed out of order: they will simply remain in the queue until they are successfully processed.

Properties

  • Very robust solution, tasks are never lost or processed out of order.
  • Requires a message queue system, which might be overkill for simple applications.

Other strategies

These strategies are just a few examples. There are many more ways to handle rate limiting in your application.

For example: some services include detailed rate limiting information in their response headers, such as the number of requests you have left, or the timestamp when you can make the next request. You could use this information to determine how long to back off, or when to make the next call.

Also: different services implement different rate-limiting policies. Some might allow for a short burst of calls, others might not. Some use a large time window for their rate limit, others use a small one. You need to take all these factors into account to build a strategy that works for your specific situation.

Conclusion

Hopefully, these strategies give you some food for thought on how to handle rate limiting in your application.

Have I missed a major strategy? Do you have examples of how you’ve handled rate limiting in your own applications? Let me know in the comments!

See also


❤️ Shout out to Peter Czala, whose answers to my StackOverflow question on this topic helped me a lot in getting to grips with the intricacies of rate limiting, and how to handle it using Polly.