General Guidelines: Azure Core

Azure Core is a library that provides cross-cutting services to other client libraries. These services include:

The HTTP pipeline
Global configuration
Credentials

The following sections define the requirements for the Azure Core library. If you are implementing a client library in a language that already has an Azure Core library, you do not need to read this section. It is primarily targeted at developers who work on the Azure Core library.

The HTTP pipeline

The HTTP pipeline consists of a HTTP transport that is wrapped by multiple policies. Each policy is a control point during which the pipeline can modify either the request and/or response. We use a default set of policies to standardize how client libraries interact with Azure services.

Telemetry
Unique request ID
Retry
Authentication
Response downloader
Distributed tracing
Logging
Proxy

In general, the client library will only need to configure these policies. However, if you are producing a new Azure Core library (for a new language), you will need to understand the requirements for each policy.

Telemetry policy

Client library usage telemetry is used by service teams (not consumers) to monitor what SDK language, client library version, and language/platform info a client is using to call into their service. Clients can prepend additional information indicating the name and version of the client application.

✅ DO send telemetry information in the User-Agent header.

✅ DO extend the user agent to list all client wrapping dependencies, sorting them from outermost to innermost. For example, Azure Storage Datalake depends on Azure Storage Blobs client library. It uses Azure Blobs client internally to perform operations, this is a client wrapping dependency.

In the example above the user agent will display them in this order Azure Datalake -> Azure Blobs -> Azure Core

☑️ YOU SHOULD include Azure Core library information in the user agent.

User Agent Format

[<application_id> ]{azsdk-<sdk_language>-<package_name>/<package_version> }+<platform_info>

<application_id>: optional application-specific string. May contain a slash, but must not contain a space. The string is supplied by the user of the client library, e.g. “AzCopy/10.0.4-Preview”
<sdk_language>: SDK’s language name (all lowercase): “net”, “python”, “java”, or “js”
<package_name>: client library package name as it appears to the developer, replacing slashes with dashes and removing the Azure indicator. For example, “Security.KeyVault” (.NET), “security.keyvault” (Java), “keyvault” (JavaScript & Python)
<package_version>: the version of the package. Note: this is not the version of the service
<platform_info>: information about the currently executing language runtime and OS, e.g. “(NODE-VERSION v4.5.0; Windows_NT 10.0.14393)”

For example, if we re-wrote AzCopy in each language using the Azure Datalake client library (which depends on Azure Blob client library), we may end up with the following user-agent strings:

(.NET) AzCopy/10.0.4-Preview azsdk-net-Storage.DataLake/12.0.0 azsdk-net-Storage.Blobs/12.0.0 azsdk-net-Core/11.0.0 (.NET Standard 2.0; Windows_NT 10.0.14393)
(JavaScript) AzCopy/10.0.4-Preview azsdk-js-storage-datalake/12.0.0 azsdk-js-storage-blob/12.0.0 azsdk-js-core-client/1.0.0 azsdk-js-core-rest-pipeline/11.0.0 (Node 18.12.1; Ubuntu; Linux x86_64; rv:34.0)
(Java) AzCopy/10.0.4-Preview azsdk-java-storage-datalake/12.0.0 azsdk-java-storage-blobs/12.0.0 azsdk-java-core/11.0.0 (Java/1.8.0_45; Macintosh; Intel Mac OS X 10_10; rv:33.0)
(Python) AzCopy/10.0.4-Preview azsdk-python-storage-datalake/12.0.0 azsdk-python-storage-blob/12.0.0 Python/3.7.3 (Ubuntu; Linux x86_64; rv:34.0)

✅ DO allow the consumer of the library to set the application ID. This allows the consumer to obtain cross-service telemetry for their app. The application ID should be settable in the relevant ClientOptions object.

✅ DO enforce that the application ID is no more than 24 characters in length. Shorter application IDs allows service teams to include diagnostic information in the “platform information” section of the user agent, while still allowing the consumer to obtain telemetry information for their own application.

☑️ YOU SHOULD send telemetry information that is normally sent in the User-Agent header in the X-MS-UserAgent header when the platform does not support changing the User-Agent header. Note that services will need to configure log gathering to capture the X-MS-UserAgent header in such a way that it can be queried through normal analytics systems.

☑️ YOU SHOULD send additional (dynamic) telemetry information as a semi-colon separated set of key-value types in the X-MS-AZSDK-Telemetry header. For example:

X-MS-AZSDK-Telemetry: class=BlobClient;method=DownloadFile;blobType=Block

The content of the header is a semi-colon key=value list. The following keys have specific meaning:

class is the name of the type within the client library that the consumer called to trigger the network operation.
method is the name of the method within the client library type that the consumer called to trigger the network operation.

Any other keys that are used should be common across all client libraries for a specific service. DO NOT include personally identifiable information (even encoded) in this header. Services need to configure log gathering to capture the X-MS-SDK-Telemetry header in such a way that it can be queried through normal analytics systems.

Unique request ID policy

TODO Add Unique Request ID Policy Requirements

Retry policy

There are many reasons why failure can occur when a client application attempts to send a network request to a service. Some examples are timeout, network infrastructure failures, service rejecting the request due to throttle/busy, service instance terminating due to service scale-down, service instance going down to be replaced with another version, service crashing due to an unhandled exception, etc. By offering a built-in retry mechanism (with a default configuration the consumer can override), our SDKs and the consumer’s application become resilient to these kinds of failures. Note that some services charge real money for each try and so consumers should be able to disable retries entirely if they prefer to save money over resiliency.

For more information, see Transient fault handling.

The HTTP Pipeline provides this functionality.

✅ DO offer the following configuration settings:

Retry policy type (exponential or fixed)
Maximum number of retries
Delay between retries (timespan/duration; this is the delay for fixed policy, the delay is increased exponentially by this amount for the exponential policy)
Maximum retry delay (timespan/duration)
List of HTTP status codes that would cause a retry (default is “all service errors that are retryable”)

✔️ YOU MAY offer additional retry mechanisms if supported by the service. For example, the Azure Storage Blob service supports retrying read operations against a secondary datacenter, or recommends the use of a per-try timeout for resilience.

✅ DO reset (or seek back to position 0) any request data stream before retrying a request

✅ DO honor any cancellation mechanism passed in to the caller which can terminate the request before retries have been attempted.

✅ DO update any query parameter or request header that gets sent to the service telling it how much time the service has to process the individual request attempt.

✅ DO retry in the case of a hardware network failure as it may self-correct.

✅ DO retry in the case of a “service not found” error. The service may be coming back online or a load balancer may be reconfiguring itself.

✅ DO retry if the service successfully responds indicating that it is throttling requests (for example, with an “x-ms-delay-until” header or similar metadata).

⛔️ DO NOT retry if the service responds with a 400-level response code unless a retry-after header is also returned.

⛔️ DO NOT change any client-side generated request-id when retrying the request. Th request ID represents the logical operation and should be the same across all physical retries of this operation. When looking at server logs, multiple entries with the same client request ID show each retry

☑️ YOU SHOULD implement a default retry policy of 3 retries with a 0.8s exponential (plus jitter) delay between attempts with a maximum delay of 60s between attempts.

✅ DO allow the customer to override the default retry policy delay and timeouts.

☑️ YOU SHOULD allow the customer to cancel the operation either via explicit cancellation or by setting an overall maximum operation time.

Authentication policy

Services across Azure use a variety of different authentication schemes to authenticate clients. Conceptually there are two entities responsible for authenticating service client requests, a credential and an authentication policy. Credentials provide confidential authentication data needed to authenticate requests. Authentication policies use the data provided by a credential to authenticate requests to the service. It is essential that credential data can be updated as needed across the lifetime of a client, and authentication policies must always use the most current credential data.

✅ DO implement Bearer authorization policy (which accepts a token credential and scope).

Response downloader policy

The response downloader is required for most (but not all) operations to change whatever is returned by the service into a model that the consumer code can use. An example of a method that does not deserialize the response payload is a method that downloads a raw blob within the Blob Storage client library. In this case, the raw data bytes are required. For most operations, the body must be downloaded in totality before deserialization. This pipeline policy must implement the following requirements:

✅ DO download the entire response body and pass the complete downloaded body up to the operation method for methods that deserialize the response payload. If a network connection fails while reading the body, the retry policy must automatically retry the operation.

Distributed tracing policy

Distributed tracing allows the consumer to trace their code from frontend to backend. The distributed tracing library creates spans (units of unique work) to facilitate tracing. Each span is in a parent-child relationship. As you go deeper into the hierarchy of code, you create more spans. These spans can then be exported to a suitable receiver as needed. To keep track of the spans, a distributed tracing context (called a context within the rest of this section) is passed into each successive layer. For more information on this topic, visit the OpenTelemetry topic on tracing.

The Distributed Tracing policy is responsible for:

Creating a SPAN per REST call.
Passing the context to the backend service.

There is also a distributed tracing topic for implementing a client library.

✅ DO support OpenTelemetry for distributed tracing.

✅ DO accept a context from calling code to establish a parent span.

✅ DO pass the context to the backend service through the appropriate headers (traceparent and tracestate per W3C Trace-Context standard) to support Azure Monitor.

✅ DO create a new span (which must be a child of the per-method span) for each REST call that the client library makes.

✅ DO populate span properties according to Tracing Conventions.

Logging policy

Many logging requirements within Azure Core mirror the same requirements for logging within the client library.

✅ DO allow the client library to set the log handler and log settings.

✅ DO use one of the following log levels when emitting logs: Verbose (details), Informational (things happened), Warning (might be a problem or not), and Error.

✅ DO use the Error logging level for failures that the application is unlikely to recover from (out of memory, etc.).

✅ DO use the Warning logging level when a function fails to perform its intended task. This generally means that the function will raise an exception. Don’t include occurrences of self-healing events (for example, when a request will be automatically retried).

✅ DO use the Informational logging level when a function operates normally.

✅ DO use the Verbose logging level for detailed troubleshooting scenarios. This is primarily intended for developers or system administrators to diagnose specific failures.

⛔️ DO NOT send sensitive information in log levels other than Verbose. For example, remove account keys when logging headers.

✅ DO log request line, response line, and headers as an Informational message.

✅ DO log an Informational message if a service call is cancelled.

✅ DO log exceptions thrown as a Warning level message. If the log level set to Verbose, append stack trace information to the message.

✅ DO include information about relevant configuration when the HTTP pipeline throws an error if there is a relevant configuration setting that would alleviate the problem. Not all errors can be fixed with a configuration change.

Proxy policy

Apps that integrate the Azure SDK need to operate in common enterprises. It is a common practice to implement HTTP proxies for control and caching purposes. Proxies are generally configured at the machine level and, as such, are part of the environment. However, there are reasons to adjust proxies (for example, testing may use a proxy to rewrite URLs to a test environment instead of a production environment). The Azure SDK and all client libraries should operate in those environments.

There are a number of common methods for proxy configuration. However, they fall into four groups:

Inline, no authentication (filtering only)
Inline, with authentication
Out-of-band, no authentication
Out of band, with authentication

For inline/no-auth proxy, nothing needs to be done. The Azure SDK will work without any proxy configuration. For inline/auth proxy, the connection may receive a 407 Proxy Authentication Required status code. This will include a scheme, realm, and potentially other information (such as a nonce for digest authentication). The client library must resubmit the request with a Proxy-Authorization header that provides authentication information suitably encoded for the scheme. The most common schemes are Basic, Digest, and NTLM.

For an out-of-band/no-auth proxy, the client will send the entire request URL to the proxy instead of the service. For example, if the client is communicating to https://foo.blob.storage.azure.net/path/to/blob, it will connect to the HTTPS_PROXY and send a GET https://foo.blob.storage.azure.net/path/to/blob HTTP/1.1. For an out-of-band/auth proxy, the client will send the entire request URL just as in the out-of-band/no-auth proxy version, but it may send back a 407 Proxy Authentication Required status code (as with the inline/auth proxy).

WebSockets can normally be tunneled through an HTTP proxy, in which case the proxy authentication happens during the CONNECT call. This is the preferred mechanism for tunneling non-HTTP traffic to the Azure service. However, there are other types of proxies. The most notable is the SOCKS proxy used for non-HTTP traffic (such as AMQP or MQTT). We make no recommendation (for or against) support of SOCKS. It is explicitly not a requirement to support SOCKS proxy within the client library.

Most proxy configuration will be done by adopting the HTTP pipeline that is common to all Azure service client libraries.

✅ DO support proxy configuration via common global configuration directives configured on a platform or runtime basis.

Linux and macOS generally use the HTTPS_PROXY (and associated) environment variables to configure proxies.
Windows environments generally use the WinHTTP proxy configuration to configure proxies.
Other options (such as loading from configuration files) may exist on a platform and runtime basis.

✅ DO support Azure SDK-wide configuration directives for proxy configuration, including disabling the proxy functionality.

✅ DO support client library-specific configuration directives for proxy configuration, including disabling the proxy functionality.

✅ DO log 407 Proxy Authentication Required requests and responses.

✅ DO indicate in logging if the request is being sent to the service via a proxy, even if proxy authentication is not required.

✅ DO support Basic and Digest authentication schemes.

☑️ YOU SHOULD support the NTLM authentication scheme.

There is no requirement to support SOCKS at this time. We recommend services adopt a WebSocket connectivity option (for example, AMQP or MQTT over WebSockets) to ensure compatibility with proxies.

Global configuration

The Azure SDK can be configured by a variety of sources, some of which are necessarily language-dependent. This will generally be codified in the Azure Core library. The configuration sources include:

System settings
Environment variables
Global configuration store (code)
Runtime parameters

✅ DO apply configuration in the order above by default, such that subsequent items in the list override settings from previous items in the list.

✔️ YOU MAY support configuration systems that users opt in to that do not follow the above ordering.

✅ DO be consistent with naming between environment variables and configuration keys.

✅ DO log when a configuration setting is found somewhere in the environment or global configuration store.

✔️ YOU MAY ignore configuration settings that are irrelevant for your client library.

System settings

☑️ YOU SHOULD respect system settings for proxies.

Environment variables

Environment variables are a well-known method for IT administrators to configure basic settings when running code in the cloud.

✅ DO load relevant configuration settings from the environment variables listed in the table below:

Environment Variable	Purpose
Proxy Settings
HTTP_PROXY	Proxy for HTTP connections
HTTPS_PROXY	Proxy for HTTPS connections
NO_PROXY	Hosts which must not use a proxy
ALL_PROXY	Proxy for HTTP and/or HTTPS connections in case HTTP_PROXY and/or HTTPS_PROXY are not defined
Identity
MSI_ENDPOINT	Azure AD MSI Credentials
MSI_SECRET	Azure AD MSI Credentials
AZURE_USERNAME	Azure username for U/P Auth
AZURE_PASSWORD	Azure password for U/P Auth
AZURE_CLIENT_CERTIFICATE_PATH	Azure Active Directory
AZURE_CLIENT_ID	Azure Active Directory
AZURE_CLIENT_SECRET	Azure Active Directory
AZURE_TENANT_ID	Azure Active Directory
AZURE_AUTHORITY_HOST	Azure Active Directory
Pipeline Configuration
AZURE_TELEMETRY_DISABLED	Disables telemetry
AZURE_LOG_LEVEL	Enable logging by setting a log level.
AZURE_TRACING_DISABLED	Disables tracing
General SDK Configuration
AZURE_CLOUD	Name of the sovereign cloud
AZURE_SUBSCRIPTION_ID	Azure subscription
AZURE_RESOURCE_GROUP	Azure Resource Group

✅ DO prefix Azure-specific environment variables with AZURE_.

Global configuration

Global configuration refers to configuration settings that are applied to all applicable client constructors in some manner.

✅ DO support global configuration of shared pipeline policies including:

Logging: Log level, swapping out logger implementation
HTTP: Proxy settings, max retries, timeout, swapping out transport implementation
Telemetry: enabled/disabled
Tracing: enabled/disabled

✅ DO provide configuration keys for setting or overriding every configuration setting inherited from the system or environment variables.

✅ DO provide a method of opting out from importing system settings and environment variables into the configuration.

Authentication and credentials

OAuth token authentication, obtained via Managed Identities or Azure Identity, is the preferred mechanism for authenticating service requests, and the only authentication credentials supported by the Azure Core library.

✅ DO provide a token credential type that can fetch an OAuth-compatible token needed to authenticate a request to the service in a non-blocking atomic manner.