Azure Proactive Resiliency Library v2
Tools Glossary GitHub GitHub Issues Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

databaseAccounts

Summary

RecommendationImpactCategoryAutomation AvailableIn Azure Advisor
Configure at least two regions for high availabilityHighHigh AvailabilityYesYes
Enable service-managed failover for multi-region accounts with single write regionHighDisaster RecoveryYesYes
Enable availability zonesHighHigh AvailabilityYesNo
Evaluate multi-region write capabilityHighHigh AvailabilityYesNo
Configure continuous backup modeHighDisaster RecoveryYesYes
Ensure query results are fully drainedHighScalabilityNoNo
Maintain singleton pattern in your clientMediumScalabilityNoNo
Implement retry logic in your clientMediumHigh AvailabilityNoNo
Monitor Cosmos DB health and set up alertsMediumMonitoring and AlertingNoNo

Details


Configure at least two regions for high availability

Impact:  High Category:  High Availability

APRL GUID:  43663217-a1d3-844b-80ea-571a2ce37c6c

Description:

Enable a secondary region in Cosmos DB for higher SLA without downtime. Simple as pinning a location on a map. For Strong consistency, configure at least three regions for write availability in case of failure.

Potential Benefits:

Enhances SLA and resilience
Learn More:
Distribute data globally with Azure Cosmos DB
Tips for building highly available applications

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Query to find Azure Cosmos DB accounts that have less than 2 regions or less than 3 regions with strong consistency level
Resources
| where type =~ 'Microsoft.DocumentDb/databaseAccounts'
| where
    array_length(properties.locations) < 2 or
    (array_length(properties.locations) < 3 and properties.consistencyPolicy.defaultConsistencyLevel == 'Strong')
| project recommendationId='43663217-a1d3-844b-80ea-571a2ce37c6c', name, id, tags



Enable service-managed failover for multi-region accounts with single write region

Impact:  High Category:  Disaster Recovery

APRL GUID:  9cabded7-a1fc-6e4a-944b-d7dd98ea31a2

Description:

Cosmos DB boasts high uptime and resiliency. Even so, issues may arise. With Service-Managed failover, if a region is down, Cosmos DB automatically switches to the next available region, requiring no user action.

Potential Benefits:

Auto failover for high uptime
Learn More:
Manage an Azure Cosmos DB account by using the Azure portal

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Query to list all Azure Cosmos DB accounts that do not have multiple write locations or automatic failover enabled
Resources
| where type =~ 'Microsoft.DocumentDb/databaseAccounts'
| where
    array_length(properties.locations) > 1 and
    tobool(properties.enableAutomaticFailover) == false and
    tobool(properties.enableMultipleWriteLocations) == false
| project recommendationId='9cabded7-a1fc-6e4a-944b-d7dd98ea31a2', name, id, tags


Enable availability zones

Impact:  High Category:  High Availability

APRL GUID:  921631f6-ed59-49a5-94c1-f0f3ececa580

Description:

When availability zones are configured, Azure Cosmos DB intelligently distributes the 4 replicas of your data across all available zones. It ensures that your Azure Cosmos DB can withstand an outage in one availability zone and remain fully operational throughout.

Potential Benefits:

Enhances high availability
Learn More:
High availability in Azure Cosmos DB

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Query to find Azure Cosmos DB accounts that do not utilize availability zones and are deployed in availability-zone supported regions
Resources
| where type == "microsoft.documentdb/databaseaccounts"
| where properties.capabilities !has_cs 'EnableServerless'
| project recommendationId='921631f6-ed59-49a5-94c1-f0f3ececa580', name, id, tags, locations=properties.locations
| mv-expand locations
| where not(locations.isZoneRedundant) //filter out already AZ enabled regions
| extend location=tostring(locations.locationName)
| project-away locations
| where location in (
    'Australia East', 'Brazil South', 'Canada Central', 'Central India', 'Central US',
    'China North 3', 'East Asia', 'East US', 'East US 2', 'France Central',
    'Germany West Central', 'Israel Central', 'Italy North', 'Japan East', 'Japan West',
    'Korea Central', 'Mexico Central', 'New Zealand North', 'North Europe', 'Norway East',
    'Poland Central', 'Qatar Central', 'South Africa North', 'South Central US', 'Southeast Asia',
    'Spain Central', 'Sweden Central', 'Switzerland North', 'UAE North', 'UK South',
    'US Gov Virginia', 'West Europe', 'West US 2', 'West US 3') // filter out regions unsupported for AZs
| project-rename param1=location


Evaluate multi-region write capability

Impact:  High Category:  High Availability

APRL GUID:  9ce78192-74a0-104c-b5bb-9a443f941649

Description:

Multi-region write capability allows for designing applications that are highly available across multiple regions, though it demands careful attention to consistency requirements and conflict resolution. Improper setup may decrease availability and cause data corruption due to unhandled conflicts.

Potential Benefits:

Enhances high availability
Learn More:
Distribute data globally with Azure Cosmos DB
Conflict resolution types and resolution policies in Azure Cosmos DB

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Query to find Azure Cosmos DB accounts that have multiple read locations but do not have multiple write locations enabled
Resources
| where type =~ 'Microsoft.DocumentDb/databaseAccounts'
| where
    array_length(properties.locations) > 1 and
    properties.enableMultipleWriteLocations == false
| project recommendationId='9ce78192-74a0-104c-b5bb-9a443f941649', name, id, tags



Configure continuous backup mode

Impact:  High Category:  Disaster Recovery

APRL GUID:  e544520b-8505-7841-9e77-1f1974ee86ec

Description:

Cosmos DB's backup is always on, offering protection against data mishaps. Continuous mode allows for self-serve restoration to a pre-mishap point, unlike periodic mode which requires contacting Microsoft support, leading to longer restore times.

Potential Benefits:

Faster self-serve data restore
Learn More:
Continuous backup with point in time restore feature in Azure Cosmos DB

ARG Query:

Click the Azure Resource Graph tab to view the query

// Azure Resource Graph Query
// Query all Azure Cosmos DB accounts that do not have continuous backup mode configured
Resources
| where type =~ 'Microsoft.DocumentDb/databaseAccounts'
| where
    properties.backupPolicy.type == 'Periodic' and
    properties.enableMultipleWriteLocations == false and
    properties.enableAnalyticalStorage == false
| project recommendationId='e544520b-8505-7841-9e77-1f1974ee86ec', name, id, tags


Ensure query results are fully drained

Impact:  High Category:  Scalability

APRL GUID:  c006604a-0d29-684c-99f0-9729cb40dac5

Description:

Cosmos DB has a 4 MB response limit, leading to paginated results for large or partition-spanning queries. Each page shows availability and provides a continuation token for the next. A while loop in code is necessary to traverse all pages until completion.

Potential Benefits:

Maximizes data retrieval efficiency
Learn More:
Pagination in Azure Cosmos DB

ARG Query:

Click the Azure Resource Graph tab to view the query

// under-development


Maintain singleton pattern in your client

Impact:  Medium Category:  Scalability

APRL GUID:  7eb32cf9-9a42-1540-acf8-597cbba8a418

Description:

Using a single instance of the SDK client for each account and application is crucial as connections are tied to the client. Compute environments have a limit on open connections, affecting connectivity when exceeded.

Potential Benefits:

Optimizes connections and efficiency
Learn More:
Designing resilient applications with Azure Cosmos DB SDKs

ARG Query:

Click the Azure Resource Graph tab to view the query

// under-development


Implement retry logic in your client

Impact:  Medium Category:  High Availability

APRL GUID:  fa6ac22f-0584-bb4b-80e4-80f4755d1a97

Description:

Cosmos DB SDKs automatically manage many transient errors through retries. Despite this, it's crucial for applications to implement additional retry policies targeting specific cases that the SDKs can't generically address, ensuring more robust error handling.

Potential Benefits:

Enhances error handling resilience
Learn More:
Designing resilient applications with Azure Cosmos DB SDKs

ARG Query:

Click the Azure Resource Graph tab to view the query

// under-development


Monitor Cosmos DB health and set up alerts

Impact:  Medium Category:  Monitoring and Alerting

APRL GUID:  deaea200-013c-414b-ac9f-bfa7a7fb13f0

Description:

Monitoring the availability and responsiveness of Azure Cosmos DB resources and having alerts set up for your workload is a good practice. This ensures you stay proactive in handling unforeseen events.

Potential Benefits:

Proactive issue management
Learn More:
Create alerts for Azure Cosmos DB using Azure Monitor

ARG Query:

Click the Azure Resource Graph tab to view the query

// under-development