Adjust logging to avoid expensive CosmosException.getMessage() diagnostics json serialization#49321
Draft
jaxpod70 wants to merge 1 commit into
Draft
Adjust logging to avoid expensive CosmosException.getMessage() diagnostics json serialization#49321jaxpod70 wants to merge 1 commit into
jaxpod70 wants to merge 1 commit into
Conversation
Contributor
|
Thank you for your contribution @jaxpod70! We will review the pull request and get back to you soon. |
Author
|
@microsoft-github-policy-service agree company="Microsoft" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes excessive log line sizes in the ChangeFeedProcessor's partition processing loop caused by CosmosException.getMessage() serializing the full CosmosDiagnostics payload (request timelines, region contacts, retry history, metadata) into JSON when logged at WARN level.
Under heavy DB connection load, this produces multi-MB log lines (30+ MB observed) on every transient error (429 throttles, timeouts, connectivity issues) across all leased partitions concurrently. This causes log ingestion pipeline throttling/rejection, increased storage costs, and memory pressure from repeated serialization in a hot loop.
Fix: Split the existing log statement into two tiers in both epkversion.PartitionProcessorImpl and pkversion.PartitionProcessorImpl:
This preserves observability for operators at default log levels while eliminating the excessive payload.
Related Issue
49320
All SDK Contribution checklist:
General Guidelines and Best Practices (https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#developer-guide)
Testing Guidelines (https://github.com/Azure/azure-sdk-for-java/blob/main/CONTRIBUTING.md#building-and-unit-testing)