Rate Limiting

Volga ships with a high-performance rate limiting system for HTTP APIs and microservices. It offers four algorithms out of the box, flexible partition keys, pluggable storage backends, and can be scoped globally, per route group, or per individual route.

This guide walks through the basics using the Token Bucket algorithm — a good default choice that allows controlled bursts while enforcing a steady average rate.

Enabling Rate Limiting

Rate limiting is an optional feature. Enable it in Cargo.toml:

[dependencies]
volga = { version = "...", features = ["rate-limiting"] }

Or enable all features at once:

[dependencies]
volga = { version = "...", features = ["full"] }

Core Concepts

Policy

A policy describes the rate limiting behavior:

Algorithm (e.g. token bucket, fixed window)
Limit parameters (capacity, refill rate, window size, etc.)
Optional eviction period for cleaning up inactive client state
Optional name — useful when different routes need different limits

Policies are configured once at the application level.

Partition Key

A partition key determines how requests are grouped for counting purposes.

Common examples:

Client IP address
HTTP header (e.g. x-api-key)
Authenticated user identity
Query or path parameter

Volga provides ready-made helpers under volga::rate_limiting::by.

Application Scope

Rate limiting middleware can be attached at three levels:

Global — all incoming requests
Route group — a set of routes sharing a prefix
Individual route — a single endpoint

Defining a Token Bucket Policy

A token bucket starts full (up to capacity tokens) and refills at a constant rate. Each request consumes one token. When the bucket is empty, requests are rejected until tokens are replenished.

use volga::rate_limiting::TokenBucket;

let bucket = TokenBucket::new(10, 5.0);

This creates a bucket with a capacity of 10 tokens and a refill rate of 5 tokens per second — allowing short bursts of up to 10 requests while sustaining an average of 5 req/s.

Named Policies

When different parts of your API need different limits, give each policy a name:

let standard = TokenBucket::new(10, 5.0)
    .with_name("standard");

let premium = TokenBucket::new(100, 50.0)
    .with_name("premium");

Eviction

By default, inactive client state is cleaned up after 60 seconds. You can adjust this:

use std::time::Duration;

let bucket = TokenBucket::new(10, 5.0)
    .with_eviction(Duration::from_secs(300));

Registering Policies

use volga::App;

let mut app = App::new()
    .with_token_bucket(standard)
    .with_token_bucket(premium);

At this point, the policies exist but are not yet active — you still need to apply them to routes.

Applying Rate Limiting

Global

Apply rate limiting to all incoming requests:

use volga::rate_limiting::by;

app.use_token_bucket(by::ip());

This limits every client by their IP address using the default (unnamed) token bucket policy.

Using a Named Policy

app.use_token_bucket(by::ip().using("standard"));

Per Route

app.map_get("/upload", upload_handler)
    .token_bucket(by::ip());

Or with a named policy:

app.map_get("/upload", upload_handler)
    .token_bucket(by::ip().using("premium"));

Per Route Group

app.group("/api", |api| {
    api.token_bucket(by::header("x-api-key").using("standard"));

    api.map_get("/status", status_handler);
    api.map_post("/upload", upload_handler);
});

Partition Key Helpers

Volga provides built-in extractors for common partition keys:

by::ip()                // Client IP address
by::header("x-api-key") // Value of an HTTP header
by::query("tenant_id")  // Query string parameter
by::path("user_id")     // Path parameter

When authentication is enabled:

by::user(|claims| claims.sub.as_str())

You can also layer multiple rate limiters with different keys by stacking middleware.

Other Algorithms

In addition to Token Bucket, Volga supports three other algorithms:

Algorithm	Best for	Method pattern
Fixed Window	Simple request counting per time window	`.with_fixed_window()` / `.fixed_window()`
Sliding Window	Smoother distribution without boundary spikes	`.with_sliding_window()` / `.sliding_window()`
GCRA	Precise pacing with configurable burst tolerance	`.with_gcra()` / `.gcra()`

Each algorithm follows the same registration and application pattern. For example, with Fixed Window:

use std::time::Duration;
use volga::rate_limiting::FixedWindow;

let mut app = App::new()
    .with_fixed_window(FixedWindow::new(100, Duration::from_secs(30)));

app.use_fixed_window(by::ip());

Pluggable Storage Backends

By default, all algorithms use an in-memory store backed by a concurrent hash map (DashMap). This works great for single-instance deployments.

For distributed scenarios (e.g. multiple instances behind a load balancer), you can implement a custom store — such as one backed by Redis — by implementing the corresponding store trait:

Algorithm	Store trait
Token Bucket	`TokenBucketStore`
Fixed Window	`FixedWindowStore`
Sliding Window	`SlidingWindowStore`
GCRA	`GcraStore`

Store traits are defined in the volga_rate_limiter crate and each requires a single atomic operation:

use volga::rate_limiting::TokenBucketStore;
use volga::rate_limiting::TokenBucketParams;

struct MyRedisStore { /* ... */ }

impl TokenBucketStore for MyRedisStore {
    fn try_consume(&self, params: TokenBucketParams) -> bool {
        let key = params.key;
        let capacity = params.capacity_scaled;
        // ... your Redis logic here
        true
    }
}

Then create the rate limiter with your custom store:

use volga::rate_limiting::TokenBucketRateLimiter;

let limiter = TokenBucketRateLimiter::with_store(10, 5.0, MyRedisStore::new());

Tips

Backends with built-in TTL support (like Redis) can skip the manual eviction step — the algorithm's eviction grace parameters are there for in-memory stores that do lazy cleanup.

Full Example

A complete working example with all four algorithms can be found here.