Quality of Service¶
Repository: Ocelot.QualityOfService.Polly
Ocelot supports Quality of Service (QoS) features that allow you to protect downstream services from overload and control request flow on a per-route basis. Two implementations are available and are mutually exclusive — exactly one may be active at a time:
Built-in QoS — included in the Ocelot core package; no additional dependencies required.
Installation (Polly) via Polly — a full-featured resilience pipeline powered by the well-regarded Polly .NET library (repository).
The last registration wins: calling AddQualityOfService() after AddPolly() replaces the Polly handler, and vice versa.
Note
Polly v7 syntax is no longer supported as of version 23.2, when the Ocelot team upgraded Polly from v7 to v8.
Implementations Overview¶
The table below summarises the key differences between the two QoS implementations.
Capability |
Built-in |
Polly |
|---|---|---|
Extra NuGet package |
None (Ocelot core) |
|
Activation |
|
|
Circuit Breaker |
✔ Custom (count mode & ratio mode) |
✔ Polly |
Per-request Timeout |
✔ |
✔ Polly |
Full Polly resilience pipeline |
✘ |
✔ |
Extensibility API |
✘ |
✔ (custom providers, handlers, error maps) |
Invalid-value handling |
Silent default substitution |
Logged warning + default substitution |
|
Disables circuit breaking, but not timing out |
Disables circuit breaking, but not timeout strategy |
|
Disables timing out, but not circuit breaking |
Disables timing out, but not circuit breaker strategy |
Note
* Use with caution, since other QoSOptions values will be substituted at runtime if at least one other strategy option is defined while the others are null.
Built-in QoS¶
Ocelot’s built-in Quality of Service implementation is part of the core Ocelot package.
It wraps every outgoing downstream request in a CircuitBreakerDelegatingHandler, providing circuit-breaker protection and an optional per-request timeout with no external dependencies.
To activate it, call AddQualityOfService() on the OcelotBuilder [1]:
builder.Services
.AddOcelot(builder.Configuration)
.AddQualityOfService();
The circuit breaker state is maintained per route — each route has its own independent circuit breaker instance. There are two operating modes, selected automatically based on the options you configure.
Count mode (default)¶
Activated when MinimumThroughput is set without FailureRatio and SamplingDuration.
The circuit opens after MinimumThroughput consecutive failures.
"QoSOptions": {
"MinimumThroughput": 3,
"BreakDuration": 1000
}
With this configuration, the circuit opens after 3 consecutive failures and remains open for 1 second.
Ratio mode¶
Activated when FailureRatio and SamplingDuration are set alongside MinimumThroughput.
The circuit opens when the ratio of failed requests within a rolling SamplingDuration window equals or exceeds FailureRatio, provided at least MinimumThroughput requests have been made in that window.
"QoSOptions": {
"MinimumThroughput": 10,
"FailureRatio": 0.5,
"SamplingDuration": 10000,
"BreakDuration": 5000
}
With this configuration, once 10 or more requests have been recorded in a 10-second rolling window, the circuit opens if 50 % or more of them are failures.
The circuit then stays open for 5 seconds before transitioning to HalfOpen.
Circuit Breaker state machine¶
The built-in circuit breaker implements the standard three-state machine:
State |
Behaviour |
|---|---|
|
Normal operation: requests pass through and failures are counted. |
|
Circuit is open: requests are immediately rejected with |
|
Exactly one probe request is allowed through.
If it succeeds, the circuit closes.
If it fails, the circuit reopens and the |
Timeout¶
An optional per-request timeout can be configured independently of or alongside the circuit breaker:
"QoSOptions": {
"Timeout": 5000
}
When a request exceeds Timeout milliseconds, it is cancelled.
A 503 Service Unavailable response is returned, and the event is recorded as a circuit-breaker failure.
Setting Timeout to 0 or a negative value disables the timeout.
To disable the per-request timeout entirely, omit the Timeout option or set it to 0.
Note
When Timeout is the only option configured, the built-in circuit breaker is still active with its default values:
MinimumThroughput = 100 and BreakDuration = 5000 ms.
The circuit opens after 100 consecutive timeout failures and stays open for 5 seconds.
To control these defaults, configure MinimumThroughput and BreakDuration explicitly.
Server Error Codes¶
The following HTTP response status codes are treated as failures by the built-in handler:
Code |
Status |
|---|---|
500 |
Internal Server Error |
501 |
Not Implemented |
502 |
Bad Gateway |
503 |
Service Unavailable |
504 |
Gateway Timeout |
505 |
HTTP Version Not Supported |
506 |
Variant Also Negotiates |
507 |
Insufficient Storage |
508 |
Loop Detected |
Any other status code (including 4xx client errors) is recorded as a success and does not contribute to the failure count.
Unhandled exceptions (excluding OperationCanceledException) are also counted as failures.
Overriding server error codes¶
The set of failure codes is exposed as the protected virtual property ServerErrorCodes on CircuitBreakerDelegatingHandler.
You can extend or replace this set by creating a subclass and overriding the property:
public class MyCircuitBreakerHandler : CircuitBreakerDelegatingHandler
{
public MyCircuitBreakerHandler(DownstreamRoute route, IOcelotLoggerFactory loggerFactory)
: base(route, loggerFactory) { }
// Treat all 5xx codes AND 429 Too Many Requests as failures
protected override HashSet<HttpStatusCode> ServerErrorCodes { get; } =
new(DefaultServerErrorCodes) { HttpStatusCode.TooManyRequests };
}
Then register it with the AddQualityOfService<THandler>() overload on OcelotBuilder:
builder.Services
.AddOcelot(builder.Configuration)
.AddQualityOfService<MyCircuitBreakerHandler>();
Built-in Value Constraints¶
The built-in handler silently substitutes a default when an option is unset or outside its valid range — no warning is logged.
Option |
Valid range |
Default |
Notes |
|---|---|---|---|
|
> 500 ms |
5000 ms |
Duration the circuit stays open before transitioning to |
|
≥ 2 |
100 |
Set to |
|
(0.0, 1.0] |
0.5 |
Ratio mode only. |
|
> 500 ms |
10 000 ms |
Ratio mode only. |
|
> 10 ms, < 86 400 000 ms |
30 000 ms |
Set to |
Installation (Polly)¶
To utilise Quality of Service via the Polly library, begin by importing the appropriate Ocelot.QualityOfService.Polly extension package:
Install-Package Ocelot.QualityOfService.Polly
Next, in your Program, incorporate Polly services by invoking the AddPolly() extension on the OcelotBuilder, as shown below [1]:
using Ocelot.QualityOfService.Polly;
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly();
Note
Prior to version 25.0, the package was named Ocelot.Provider.Polly. If you are using version 24.1 or earlier, install the Ocelot.Provider.Polly package. For version 25.0 and later, the package ID is Ocelot.QualityOfService.Polly.
QoSOptions Schema¶
Class: FileQoSOptions
Here is the complete Quality of Service configuration, also known as the “QoS options schema”. This schema is shared by both the Built-in QoS and the Installation (Polly) implementations. Depending on your needs and chosen strategies, definition of all properties is not required. If you skip a property, a default value will be substituted — see Built-in Value Constraints for the built-in implementation and Value constraints (Polly) for Polly.
"QoSOptions": {
// Circuit Breaker strategy
"BreakDuration": 0, // integer
"MinimumThroughput": 0, // integer
"FailureRatio": 0.0, // floating number
"SamplingDuration": 0, // integer
// Timeout strategy
"Timeout": 0, // integer
// Deprecated options
"DurationOfBreak": 0, // deprecated! -> use BreakDuration
"ExceptionsAllowedBeforeBreaking": 0, // deprecated! -> use MinimumThroughput
"TimeoutValue": 0, // deprecated! -> use Timeout
}
Ocelot Option and Polly equivalent |
Description |
|---|---|
|
This is duration of break the circuit will stay open before resetting. The unit is milliseconds. |
|
This number of actions or more must pass through the circuit within the time slice for the statistics to be considered significant and for the circuit breaker to engage |
|
This is the failure-to-success ratio at which the circuit will break |
|
This is the duration of the sampling over which failure ratios are assessed. The unit is milliseconds. |
|
This is the default timeout. The unit is milliseconds. |
Warning
The following options are deprecated in version 24.1: DurationOfBreak, ExceptionsAllowedBeforeBreaking, and TimeoutValue!
Use the appropriate new options as shown in the table above.
These deprecated options will be removed in version 25.0.
For backward compatibility in version 24.1, a deprecated option takes precedence over its replacement.
Note [2]: Ocelot checks that the values of options are valid during execution. If not, it logs errors or warnings (refer to the Value constraints (Polly) section in Notes for Polly, or Built-in Value Constraints for the built-in implementation). For a complete explanation about strategies and mechanisms, consult Polly’s Resilience strategies documentation.
Global Configuration [3]¶
According to the Global Configuration Schema, global Quality of Service options for static routes were introduced in version 24.1.
These global options can also be overridden in the Routes configuration section, a capability that has been supported for a long time.
{
"Routes": [
{
"Key": "R0", // optional
"QoSOptions": {
"Timeout": 15000 // 15s
},
// ...
},
{
"Key": "R1", // this route is part of a group
"QoSOptions": {}, // optional due to grouping
// ...
}
],
"GlobalConfiguration": {
"BaseUrl": "https://ocelot.net",
"QoSOptions": {
"RouteKeys": ["R1",], // if undefined or empty array, opts will apply to all routes
"BreakDuration": 1000, // 1s
"MinimumThroughput": 3
},
// ...
}
}
Dynamic routes were not supported in versions prior to 24.1.
However, global Quality of Service options have been available in Dynamic Routing mode for a long time.
Starting with version 24.1, global QoS options can also be overridden in the DynamicRoutes configuration section, as defined by the Dynamic Route Schema.
{
"DynamicRoutes": [
{
"Key": "", // optional
"ServiceName": "my-service",
"QoSOptions": {
"Timeout": 15000 // 15s
},
}
],
"GlobalConfiguration": {
"BaseUrl": "https://ocelot.net",
"DownstreamScheme": "http",
"ServiceDiscoveryProvider": {
// required section for dynamic routing
},
"QoSOptions": {
"RouteKeys": [], // or null, no grouping, thus opts apply to all dynamic routes
"BreakDuration": 1000, // 1s
"MinimumThroughput": 3,
"FailureRatio": 0.1, // 10%
"SamplingDuration": 30000 // 30s
}
}
}
In this dynamic routing configuration, the Timeout strategy (Polly) is applied to the my-service service in addition to the Circuit Breaker strategy (Polly), resulting in Polly timing out after 15 seconds.
However, for all implicit dynamic routes, the Timeout strategy (Polly) is not globally configured, in favor of the standard Timeout option managed by the Ocelot Core requester middleware.
Lastly, the Circuit Breaker strategy (Polly) has been globally configured for all routes due to the absence of route grouping, with the following options:
allow 3 errors before breaking the circuit for 1 second, and allow up to 10% errors during the default 30-second sampling period.
Note
1. Please note that route-level options take precedence over global options.
2. If the RouteKeys option is not defined or the array is empty in the global QoSOptions, the global options will apply to all routes.
If the array contains route keys, it defines a single group of routes to which the global options apply.
Routes excluded from this group must specify their own route-level QoSOptions.
3. When using the Polly implementation: Ocelot’s Polly provider utilizes the Resilience pipeline registry, so each route has a dedicated pipeline cached in Polly’s registry using the route’s load-balancing key.
For a static route, the load-balancing key uniquely identifies the route by its upstream options, whereas for dynamic routes the load-balancing key is typically the service name from the discovery provider.
Thus, Polly’s registry maintains dedicated pipelines for each discovered service, and those pipelines behave independently.
Finally, it is important to understand that global QoS options do not create a single shared resilience pipeline in the registry.
When using the built-in implementation: each route also gets its own independent CircuitBreakerDelegatingHandler instance, so circuit state is always per-route.
4. Dynamic routes were not supported in versions prior to 24.1.
Beginning with version 24.1, global QoS options for Dynamic Routing may be overridden in the DynamicRoutes configuration section, as defined by the Dynamic Route Schema.
Additionally, global configuration for static routes (also known as Routes) has been supported since version 24.1.
Circuit Breaker strategy (Polly)¶
Implementation: PollyDocumentation: Circuit breaker resilience strategyPrimary option:MinimumThroughput, formerlyExceptionsAllowedBeforeBreaking
Note
This section describes the Circuit Breaker behaviour when using the Polly implementation. For the built-in implementation, see Count mode (default) and Ratio mode.
The options MinimumThroughput and BreakDuration can be configured independently from Timeout:
"QoSOptions": {
"MinimumThroughput": 3,
"BreakDuration": 1000 // ms
}
Alternatively, you can omit BreakDuration, which will default to the implicit 5-second setting as specified in Polly’s BreakDuration documentation:
"QoSOptions": {
"MinimumThroughput": 3
}
This setup activates only the Circuit breaker resilience strategy.
Additionally, there is a failure handling strategy based on FailureRatio, which serves as a counterpart to, or supplement for, the number of failures, also known as MinimumThroughput.
"QoSOptions": {
"MinimumThroughput": 10,
"FailureRatio": 0.5, // 50%
"SamplingDuration": 10000, // ms, 10 seconds
}
Thus, a failure ratio of 0.5 indicates that the circuit will break if 50% or more of actions result in handled failures, after reaching the minimum threshold of 10 failures, also known as the MinimumThroughput option.
Additionally, the 10-second sampling duration defines the time window over which the 50% failure ratio is evaluated.
Note: The
MinimumThroughputoption (also known as Polly’s MinimumThroughput) is the primary option that enables the Circuit Breaker strategy. Its value must be valid (set to 2 or greater, refer to the Value constraints (Polly) section in Notes) and may be supplemented with additional Circuit Breaker options.
Timeout strategy (Polly)¶
Implementation: PollyDocumentation: Timeout resilience strategyPrimary option:Timeout, formerlyTimeoutValue
Note
This section describes the Timeout behaviour when using the Polly implementation. For the built-in implementation, see Timeout.
The Timeout can be configured independently from the options of the Circuit Breaker strategy (Polly):
"QoSOptions": {
"Timeout": 5000 // ms
}
This setup activates only the Timeout resilience strategy.
To configure a global QoS timeout using the Timeout strategy for all routes (both static and dynamic) set the Timeout option as defined in the Global Configuration Schema:
"GlobalConfiguration": {
// other global props
"QoSOptions": {
"Timeout": 10000 // ms, 10 seconds
}
}
Please note that the route-level timeout takes precedence over the global timeout. For example, a route timeout may be shorter, while the global timeout can be longer and apply to all routes.
Note: There are Value constraints (Polly) for
Timeout: it must be a positive number starting from 1 millisecond to enable the Timeout strategy. IfTimeoutis undefined, zero or a negative number, the Timeout strategy will not be added to the resilience pipeline. Also, keep in mind Polly’s Timeout constraint, thus Ocelot validates theTimeout. If the value violates Polly’s requirements, it will be rolled back to the default of 30 seconds.
Notes¶
Absolute timeout [4]¶
If a QoS section is not included, QoS will not be applied, and Ocelot will enforce an absolute timeout of 90 seconds (defined by the DownstreamRoute DefTimeout constant) for all downstream requests.
This absolute timeout is configurable via the DownstreamRoute DefaultTimeoutSeconds static C# property.
For more information, refer to the Default timeout section of the Configuration chapter.
Value constraints (Polly)¶
Note
The constraints below apply to the Polly implementation. For the built-in implementation’s constraints, see Built-in Value Constraints.
Starting with Polly v8, the Resilience strategies documentation outlines the following constraints on values:
The
BreakDurationvalue must exceed 500 milliseconds and be less than 24 hours (1 day =86 400 000milliseconds). If unspecified or invalid, it defaults to 5000 milliseconds (5 seconds); refer to the BreakDuration documentation.The
MinimumThroughputvalue must be 2 or greater. If unspecified or invalid, it defaults to 100 failures; refer to the MinimumThroughput documentation.The
FailureRatiomust be greater than 0.0 and no more than 1.0. If unspecified or invalid, it defaults to 0.1 (10%); refer to the FailureRatio documentation.The
SamplingDurationvalue must exceed 500 milliseconds and be less than 24 hours (1 day =86 400 000milliseconds). If unspecified or invalid, it defaults to 30000 milliseconds (30 seconds); refer to the SamplingDuration documentation.The
Timeoutmust be greater than 10 milliseconds and less than 24 hours (1 day =86 400 000milliseconds). If unspecified or invalid, it defaults to 30000 milliseconds (30 seconds); refer to the Timeout documentation. And please note, when both route-level and global QoS timeouts have positive values but are invalid, a default value will be automatically substituted from theTimeoutStrategyclass DefaultTimeout static C# property, which can also be configured in your Program.
Ocelot logs warnings containing failed validation messages for all options, but it does not block Ocelot startup, even when QoS options are invalid. Inspect your logs for these messages and adjust your configuration if necessary.
QoS and route (global) timeouts¶
The Timeout option in QoS always takes precedence over the route Timeout property, so Timeout will be ignored in favor of QoS Timeout.
In Ocelot Core, Timeout and configuration Timeout are not intended to be used together.
Moreover, there is an Ocelot Core design constraint: if the route or global Timeout duration is shorter than the QoS Timeout, you may encounter warning messages in the logs that begin with the following sentence:
Route '/xxx' has Quality of Service settings (QoSOptions) enabled, but either the route Timeout or the QoS Timeout is misconfigured: ...
This warning means that the route or global timeout will occur before the QoS Timeout strategy (Polly) has a chance to handle its own timeout event, which is configured with a longer duration.
Technically, this situation results in the functional disabling of the Polly’s Timeout resilience strategy.
Ocelot handles this misconfiguration by logging a warning and automatically applying a longer timeout to the TimeoutDelegatingHandler in order to effectively unblock the QoS Timeout strategy (Polly).
To avoid this warning, ensure that your QoS timeouts are shorter than the route or global timeouts, or remove the Timeout property from routes where QoS is enabled with the Timeout option.
Global and default QoS timeouts¶
If a route-level QoS timeout is undefined, the global Timeout takes precedence over the default timeout (30 seconds, see the Timeout docs).
This means the global QoS timeout can override Polly’s default of 30 seconds via the Global Configuration Schema.
Extensibility (Polly) [5]¶
To use your ResiliencePipeline<T> provider, you can apply the following syntax:
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly<MyProvider>();
// MyProvider should implement IPollyQoSResiliencePipelineProvider<HttpResponseMessage>
// Note: you can use standard provider PollyQoSResiliencePipelineProvider
Additionally, if you want to utilize your own DelegatingHandler, the following syntax can be applied:
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly<MyProvider>(MyQosDelegatingHandlerDelegate);
// MyQosDelegatingHandlerDelegate is a delegate use to get a DelegatingHandler. Refer to Ocelot's PollyResiliencePipelineDelegatingHandler
Finally, to define your own set of exceptions for mapping, you can apply the following syntax:
static Error CreateError(Exception e) => new RequestTimedOutError(e);
Dictionary<Type, Func<Exception, Error>> MyErrorMapping = new()
{
{typeof(TaskCanceledException), CreateError},
{typeof(TimeoutRejectedException), CreateError},
{typeof(BrokenCircuitException), CreateError},
{typeof(BrokenCircuitException<HttpResponseMessage>), CreateError},
};
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly<MyProvider>(MyErrorMapping);
// Note: Default error mapping is defined in the DefaultErrorMapping field of the Ocelot.QualityOfService.Polly.OcelotBuilderExtensions class