Quality of Service

Label: label QoS

Ocelot supports Quality of Service (QoS) features that allow you to protect downstream services from overload and control request flow on a per-route basis. Two implementations are available and are mutually exclusive — exactly one may be active at a time:

The last registration wins: calling AddQualityOfService() after AddPolly() replaces the Polly handler, and vice versa.

Note

Polly v7 syntax is no longer supported as of version 23.2, when the Ocelot team upgraded Polly from v7 to v8.

Implementations Overview

The table below summarises the key differences between the two QoS implementations.

Capability

Built-in

Polly

Extra NuGet package

None (Ocelot core)

Ocelot.QualityOfService.Polly

Activation

.AddQualityOfService()

.AddPolly()

Circuit Breaker

✔ Custom (count mode & ratio mode)

✔ Polly CircuitBreakerResilienceStrategy

Per-request Timeout

CancellationToken-based

✔ Polly TimeoutResilienceStrategy

Full Polly resilience pipeline

Extensibility API

✔ (custom providers, handlers, error maps)

Invalid-value handling

Silent default substitution

Logged warning + default substitution

MinimumThroughput = 0 *

Disables circuit breaking, but not timing out

Disables circuit breaking, but not timeout strategy

Timeout = 0 *

Disables timing out, but not circuit breaking

Disables timing out, but not circuit breaker strategy

Note

* Use with caution, since other QoSOptions values will be substituted at runtime if at least one other strategy option is defined while the others are null.

Built-in QoS

Ocelot’s built-in Quality of Service implementation is part of the core Ocelot package. It wraps every outgoing downstream request in a CircuitBreakerDelegatingHandler, providing circuit-breaker protection and an optional per-request timeout with no external dependencies.

To activate it, call AddQualityOfService() on the OcelotBuilder [1]:

builder.Services
    .AddOcelot(builder.Configuration)
    .AddQualityOfService();

The circuit breaker state is maintained per route — each route has its own independent circuit breaker instance. There are two operating modes, selected automatically based on the options you configure.

Count mode (default)

Activated when MinimumThroughput is set without FailureRatio and SamplingDuration. The circuit opens after MinimumThroughput consecutive failures.

"QoSOptions": {
  "MinimumThroughput": 3,
  "BreakDuration": 1000
}

With this configuration, the circuit opens after 3 consecutive failures and remains open for 1 second.

Ratio mode

Activated when FailureRatio and SamplingDuration are set alongside MinimumThroughput. The circuit opens when the ratio of failed requests within a rolling SamplingDuration window equals or exceeds FailureRatio, provided at least MinimumThroughput requests have been made in that window.

"QoSOptions": {
  "MinimumThroughput": 10,
  "FailureRatio": 0.5,
  "SamplingDuration": 10000,
  "BreakDuration": 5000
}

With this configuration, once 10 or more requests have been recorded in a 10-second rolling window, the circuit opens if 50 % or more of them are failures. The circuit then stays open for 5 seconds before transitioning to HalfOpen.

Circuit Breaker state machine

The built-in circuit breaker implements the standard three-state machine:

State

Behaviour

Closed

Normal operation: requests pass through and failures are counted.

Open

Circuit is open: requests are immediately rejected with 503 Service Unavailable — no downstream call is made. After BreakDuration has elapsed, the circuit transitions to HalfOpen.

HalfOpen

Exactly one probe request is allowed through. If it succeeds, the circuit closes. If it fails, the circuit reopens and the BreakDuration timer restarts. All other concurrent requests while the probe is in flight are rejected with 503.

Timeout

An optional per-request timeout can be configured independently of or alongside the circuit breaker:

"QoSOptions": {
  "Timeout": 5000
}

When a request exceeds Timeout milliseconds, it is cancelled. A 503 Service Unavailable response is returned, and the event is recorded as a circuit-breaker failure.

Setting Timeout to 0 or a negative value disables the timeout. To disable the per-request timeout entirely, omit the Timeout option or set it to 0.

Note

When Timeout is the only option configured, the built-in circuit breaker is still active with its default values: MinimumThroughput = 100 and BreakDuration = 5000 ms. The circuit opens after 100 consecutive timeout failures and stays open for 5 seconds. To control these defaults, configure MinimumThroughput and BreakDuration explicitly.

Server Error Codes

The following HTTP response status codes are treated as failures by the built-in handler:

Code

Status

500

Internal Server Error

501

Not Implemented

502

Bad Gateway

503

Service Unavailable

504

Gateway Timeout

505

HTTP Version Not Supported

506

Variant Also Negotiates

507

Insufficient Storage

508

Loop Detected

Any other status code (including 4xx client errors) is recorded as a success and does not contribute to the failure count. Unhandled exceptions (excluding OperationCanceledException) are also counted as failures.

Overriding server error codes

The set of failure codes is exposed as the protected virtual property ServerErrorCodes on CircuitBreakerDelegatingHandler. You can extend or replace this set by creating a subclass and overriding the property:

public class MyCircuitBreakerHandler : CircuitBreakerDelegatingHandler
{
    public MyCircuitBreakerHandler(DownstreamRoute route, IOcelotLoggerFactory loggerFactory)
        : base(route, loggerFactory) { }

    // Treat all 5xx codes AND 429 Too Many Requests as failures
    protected override HashSet<HttpStatusCode> ServerErrorCodes { get; } =
        new(DefaultServerErrorCodes) { HttpStatusCode.TooManyRequests };
}

Then register it with the AddQualityOfService<THandler>() overload on OcelotBuilder:

builder.Services
    .AddOcelot(builder.Configuration)
    .AddQualityOfService<MyCircuitBreakerHandler>();

Built-in Value Constraints

The built-in handler silently substitutes a default when an option is unset or outside its valid range — no warning is logged.

Option

Valid range

Default

Notes

BreakDuration

> 500 ms

5000 ms

Duration the circuit stays open before transitioning to HalfOpen.

MinimumThroughput

≥ 2

100

Set to 0 or negative to disable circuit-breaking entirely.

FailureRatio

(0.0, 1.0]

0.5

Ratio mode only.

SamplingDuration

> 500 ms

10 000 ms

Ratio mode only.

Timeout

> 10 ms, < 86 400 000 ms

30 000 ms

Set to 0 or negative to disable timing out. Invalid positive values outside the range use the default (30 s).

Installation (Polly)

To utilise Quality of Service via the Polly library, begin by importing the appropriate Ocelot.QualityOfService.Polly extension package:

Install-Package Ocelot.QualityOfService.Polly

Next, in your Program, incorporate Polly services by invoking the AddPolly() extension on the OcelotBuilder, as shown below [1]:

using Ocelot.QualityOfService.Polly;

builder.Services
    .AddOcelot(builder.Configuration)
    .AddPolly();

Note

Prior to version 25.0, the package was named Ocelot.Provider.Polly. If you are using version 24.1 or earlier, install the Ocelot.Provider.Polly package. For version 25.0 and later, the package ID is Ocelot.QualityOfService.Polly.

QoSOptions Schema

Here is the complete Quality of Service configuration, also known as the “QoS options schema”. This schema is shared by both the Built-in QoS and the Installation (Polly) implementations. Depending on your needs and chosen strategies, definition of all properties is not required. If you skip a property, a default value will be substituted — see Built-in Value Constraints for the built-in implementation and Value constraints (Polly) for Polly.

"QoSOptions": {
  // Circuit Breaker strategy
  "BreakDuration": 0, // integer
  "MinimumThroughput": 0, // integer
  "FailureRatio": 0.0, // floating number
  "SamplingDuration": 0, // integer
  // Timeout strategy
  "Timeout": 0, // integer
  // Deprecated options
  "DurationOfBreak": 0, // deprecated! -> use BreakDuration
  "ExceptionsAllowedBeforeBreaking": 0, // deprecated! -> use MinimumThroughput
  "TimeoutValue": 0, // deprecated! -> use Timeout
}

Ocelot Option and Polly equivalent

Description

BreakDuration (formerly DurationOfBreak) as BreakDuration

This is duration of break the circuit will stay open before resetting. The unit is milliseconds.

MinimumThroughput (formerly ExceptionsAllowedBeforeBreaking) as MinimumThroughput, a primary option

This number of actions or more must pass through the circuit within the time slice for the statistics to be considered significant and for the circuit breaker to engage

FailureRatio is FailureRatio

This is the failure-to-success ratio at which the circuit will break

SamplingDuration is SamplingDuration

This is the duration of the sampling over which failure ratios are assessed. The unit is milliseconds.

Timeout (formerly TimeoutValue) as Timeout, a primary option

This is the default timeout. The unit is milliseconds.

Warning

The following options are deprecated in version 24.1: DurationOfBreak, ExceptionsAllowedBeforeBreaking, and TimeoutValue! Use the appropriate new options as shown in the table above. These deprecated options will be removed in version 25.0. For backward compatibility in version 24.1, a deprecated option takes precedence over its replacement.

Note [2]: Ocelot checks that the values of options are valid during execution. If not, it logs errors or warnings (refer to the Value constraints (Polly) section in Notes for Polly, or Built-in Value Constraints for the built-in implementation). For a complete explanation about strategies and mechanisms, consult Polly’s Resilience strategies documentation.

Global Configuration [3]

According to the Global Configuration Schema, global Quality of Service options for static routes were introduced in version 24.1. These global options can also be overridden in the Routes configuration section, a capability that has been supported for a long time.

{
  "Routes": [
    {
      "Key": "R0", // optional
      "QoSOptions": {
        "Timeout": 15000 // 15s
      },
      // ...
    },
    {
      "Key": "R1", // this route is part of a group
      "QoSOptions": {}, // optional due to grouping
      // ...
    }
  ],
  "GlobalConfiguration": {
    "BaseUrl": "https://ocelot.net",
    "QoSOptions": {
      "RouteKeys": ["R1",], // if undefined or empty array, opts will apply to all routes
      "BreakDuration": 1000, // 1s
      "MinimumThroughput": 3
    },
    // ...
  }
}

Dynamic routes were not supported in versions prior to 24.1. However, global Quality of Service options have been available in Dynamic Routing mode for a long time. Starting with version 24.1, global QoS options can also be overridden in the DynamicRoutes configuration section, as defined by the Dynamic Route Schema.

{
  "DynamicRoutes": [
    {
      "Key": "", // optional
      "ServiceName": "my-service",
      "QoSOptions": {
        "Timeout": 15000 // 15s
      },
    }
  ],
  "GlobalConfiguration": {
    "BaseUrl": "https://ocelot.net",
    "DownstreamScheme": "http",
    "ServiceDiscoveryProvider": {
      // required section for dynamic routing
    },
    "QoSOptions": {
      "RouteKeys": [], // or null, no grouping, thus opts apply to all dynamic routes
      "BreakDuration": 1000, // 1s
      "MinimumThroughput": 3,
      "FailureRatio": 0.1, // 10%
      "SamplingDuration": 30000 // 30s
    }
  }
}

In this dynamic routing configuration, the Timeout strategy (Polly) is applied to the my-service service in addition to the Circuit Breaker strategy (Polly), resulting in Polly timing out after 15 seconds. However, for all implicit dynamic routes, the Timeout strategy (Polly) is not globally configured, in favor of the standard Timeout option managed by the Ocelot Core requester middleware. Lastly, the Circuit Breaker strategy (Polly) has been globally configured for all routes due to the absence of route grouping, with the following options: allow 3 errors before breaking the circuit for 1 second, and allow up to 10% errors during the default 30-second sampling period.

Note

1. Please note that route-level options take precedence over global options.

2. If the RouteKeys option is not defined or the array is empty in the global QoSOptions, the global options will apply to all routes. If the array contains route keys, it defines a single group of routes to which the global options apply. Routes excluded from this group must specify their own route-level QoSOptions.

3. When using the Polly implementation: Ocelot’s Polly provider utilizes the Resilience pipeline registry, so each route has a dedicated pipeline cached in Polly’s registry using the route’s load-balancing key. For a static route, the load-balancing key uniquely identifies the route by its upstream options, whereas for dynamic routes the load-balancing key is typically the service name from the discovery provider. Thus, Polly’s registry maintains dedicated pipelines for each discovered service, and those pipelines behave independently. Finally, it is important to understand that global QoS options do not create a single shared resilience pipeline in the registry. When using the built-in implementation: each route also gets its own independent CircuitBreakerDelegatingHandler instance, so circuit state is always per-route.

4. Dynamic routes were not supported in versions prior to 24.1. Beginning with version 24.1, global QoS options for Dynamic Routing may be overridden in the DynamicRoutes configuration section, as defined by the Dynamic Route Schema. Additionally, global configuration for static routes (also known as Routes) has been supported since version 24.1.

Circuit Breaker strategy (Polly)

Implementation: Polly
Primary option: MinimumThroughput, formerly ExceptionsAllowedBeforeBreaking

Note

This section describes the Circuit Breaker behaviour when using the Polly implementation. For the built-in implementation, see Count mode (default) and Ratio mode.

The options MinimumThroughput and BreakDuration can be configured independently from Timeout:

"QoSOptions": {
  "MinimumThroughput": 3,
  "BreakDuration": 1000 // ms
}

Alternatively, you can omit BreakDuration, which will default to the implicit 5-second setting as specified in Polly’s BreakDuration documentation:

"QoSOptions": {
  "MinimumThroughput": 3
}

This setup activates only the Circuit breaker resilience strategy.

Additionally, there is a failure handling strategy based on FailureRatio, which serves as a counterpart to, or supplement for, the number of failures, also known as MinimumThroughput.

"QoSOptions": {
  "MinimumThroughput": 10,
  "FailureRatio": 0.5, // 50%
  "SamplingDuration": 10000, // ms, 10 seconds
}

Thus, a failure ratio of 0.5 indicates that the circuit will break if 50% or more of actions result in handled failures, after reaching the minimum threshold of 10 failures, also known as the MinimumThroughput option. Additionally, the 10-second sampling duration defines the time window over which the 50% failure ratio is evaluated.

Note: The MinimumThroughput option (also known as Polly’s MinimumThroughput) is the primary option that enables the Circuit Breaker strategy. Its value must be valid (set to 2 or greater, refer to the Value constraints (Polly) section in Notes) and may be supplemented with additional Circuit Breaker options.

Timeout strategy (Polly)

Implementation: Polly
Primary option: Timeout, formerly TimeoutValue

Note

This section describes the Timeout behaviour when using the Polly implementation. For the built-in implementation, see Timeout.

The Timeout can be configured independently from the options of the Circuit Breaker strategy (Polly):

"QoSOptions": {
  "Timeout": 5000 // ms
}

This setup activates only the Timeout resilience strategy.

To configure a global QoS timeout using the Timeout strategy for all routes (both static and dynamic) set the Timeout option as defined in the Global Configuration Schema:

"GlobalConfiguration": {
  // other global props
  "QoSOptions": {
    "Timeout": 10000 // ms, 10 seconds
  }
}

Please note that the route-level timeout takes precedence over the global timeout. For example, a route timeout may be shorter, while the global timeout can be longer and apply to all routes.

Note: There are Value constraints (Polly) for Timeout: it must be a positive number starting from 1 millisecond to enable the Timeout strategy. If Timeout is undefined, zero or a negative number, the Timeout strategy will not be added to the resilience pipeline. Also, keep in mind Polly’s Timeout constraint, thus Ocelot validates the Timeout. If the value violates Polly’s requirements, it will be rolled back to the default of 30 seconds.

Notes

Absolute timeout [4]

If a QoS section is not included, QoS will not be applied, and Ocelot will enforce an absolute timeout of 90 seconds (defined by the DownstreamRoute DefTimeout constant) for all downstream requests. This absolute timeout is configurable via the DownstreamRoute DefaultTimeoutSeconds static C# property. For more information, refer to the Default timeout section of the Configuration chapter.

Value constraints (Polly)

Note

The constraints below apply to the Polly implementation. For the built-in implementation’s constraints, see Built-in Value Constraints.

Starting with Polly v8, the Resilience strategies documentation outlines the following constraints on values:

  • The BreakDuration value must exceed 500 milliseconds and be less than 24 hours (1 day = 86 400 000 milliseconds). If unspecified or invalid, it defaults to 5000 milliseconds (5 seconds); refer to the BreakDuration documentation.

  • The MinimumThroughput value must be 2 or greater. If unspecified or invalid, it defaults to 100 failures; refer to the MinimumThroughput documentation.

  • The FailureRatio must be greater than 0.0 and no more than 1.0. If unspecified or invalid, it defaults to 0.1 (10%); refer to the FailureRatio documentation.

  • The SamplingDuration value must exceed 500 milliseconds and be less than 24 hours (1 day = 86 400 000 milliseconds). If unspecified or invalid, it defaults to 30000 milliseconds (30 seconds); refer to the SamplingDuration documentation.

  • The Timeout must be greater than 10 milliseconds and less than 24 hours (1 day = 86 400 000 milliseconds). If unspecified or invalid, it defaults to 30000 milliseconds (30 seconds); refer to the Timeout documentation. And please note, when both route-level and global QoS timeouts have positive values but are invalid, a default value will be automatically substituted from the TimeoutStrategy class DefaultTimeout static C# property, which can also be configured in your Program.

Ocelot logs warnings containing failed validation messages for all options, but it does not block Ocelot startup, even when QoS options are invalid. Inspect your logs for these messages and adjust your configuration if necessary.

QoS and route (global) timeouts

The Timeout option in QoS always takes precedence over the route Timeout property, so Timeout will be ignored in favor of QoS Timeout. In Ocelot Core, Timeout and configuration Timeout are not intended to be used together. Moreover, there is an Ocelot Core design constraint: if the route or global Timeout duration is shorter than the QoS Timeout, you may encounter warning messages in the logs that begin with the following sentence:

Route '/xxx' has Quality of Service settings (QoSOptions) enabled, but either the route Timeout or the QoS Timeout is misconfigured: ...

This warning means that the route or global timeout will occur before the QoS Timeout strategy (Polly) has a chance to handle its own timeout event, which is configured with a longer duration. Technically, this situation results in the functional disabling of the Polly’s Timeout resilience strategy. Ocelot handles this misconfiguration by logging a warning and automatically applying a longer timeout to the TimeoutDelegatingHandler in order to effectively unblock the QoS Timeout strategy (Polly). To avoid this warning, ensure that your QoS timeouts are shorter than the route or global timeouts, or remove the Timeout property from routes where QoS is enabled with the Timeout option.

Global and default QoS timeouts

If a route-level QoS timeout is undefined, the global Timeout takes precedence over the default timeout (30 seconds, see the Timeout docs). This means the global QoS timeout can override Polly’s default of 30 seconds via the Global Configuration Schema.

Extensibility (Polly) [5]

To use your ResiliencePipeline<T> provider, you can apply the following syntax:

builder.Services
    .AddOcelot(builder.Configuration)
    .AddPolly<MyProvider>();
// MyProvider should implement IPollyQoSResiliencePipelineProvider<HttpResponseMessage>
// Note: you can use standard provider PollyQoSResiliencePipelineProvider

Additionally, if you want to utilize your own DelegatingHandler, the following syntax can be applied:

builder.Services
    .AddOcelot(builder.Configuration)
    .AddPolly<MyProvider>(MyQosDelegatingHandlerDelegate);
// MyQosDelegatingHandlerDelegate is a delegate use to get a DelegatingHandler. Refer to Ocelot's PollyResiliencePipelineDelegatingHandler

Finally, to define your own set of exceptions for mapping, you can apply the following syntax:

static Error CreateError(Exception e) => new RequestTimedOutError(e);
Dictionary<Type, Func<Exception, Error>> MyErrorMapping = new()
{
    {typeof(TaskCanceledException), CreateError},
    {typeof(TimeoutRejectedException), CreateError},
    {typeof(BrokenCircuitException), CreateError},
    {typeof(BrokenCircuitException<HttpResponseMessage>), CreateError},
};
builder.Services
    .AddOcelot(builder.Configuration)
    .AddPolly<MyProvider>(MyErrorMapping);
// Note: Default error mapping is defined in the DefaultErrorMapping field of the Ocelot.QualityOfService.Polly.OcelotBuilderExtensions class