fix: guard executor shutdown in BaseCaptureStrategy.stop()#5627
fix: guard executor shutdown in BaseCaptureStrategy.stop()#5627tsushanth wants to merge 2 commits into
Conversation
Each start/stop cycle leaked one SentryReplayPersister-* thread because stop() reset delegated properties (segmentTimestamp, currentReplayId) whose setters dispatch to persistingExecutor, initialising the lazy — but stop() never shut it down. Replace the lazy delegate with an explicit nullable holder so the executor is only created when actually needed and can be detected at stop() time. Call shutdownNow() (non-blocking) rather than the blocking shutdown() to avoid ANRs when stop() runs on the main thread. Fixes getsentry#5564
|
Thanks for contributing, could you run spotlessApply to fix the formatting failure? Then we'll review this more closely. |
| private val persistingExecutor: ScheduledExecutorService | ||
| get() { | ||
| return persistingExecutorHolder | ||
| ?: run { | ||
| val delegate = | ||
| Executors.newSingleThreadScheduledExecutor( | ||
| ReplayPersistingExecutorServiceThreadFactory() | ||
| ) | ||
| ReplayExecutorService(delegate, options).also { persistingExecutorHolder = it } | ||
| } | ||
| } |
There was a problem hiding this comment.
Bug: The custom getter for persistingExecutor is not thread-safe. Concurrent access can lead to a race condition, causing multiple executor instances to be created and resulting in a thread leak.
Severity: HIGH
Suggested Fix
Synchronize access to the persistingExecutor getter to ensure only one instance is created. This can be achieved by using a synchronized block around the check-and-create logic or by adding the @Volatile annotation to the persistingExecutorHolder field and using a double-checked locking pattern. The original lazy delegate, which is thread-safe by default, is a good model for a correct implementation.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location:
sentry-android-replay/src/main/java/io/sentry/android/replay/capture/BaseCaptureStrategy.kt#L63-L74
Potential issue: The custom getter for the `persistingExecutor` property implements an
unsynchronized lazy initialization pattern. Multiple threads, including the main thread
and background `replayExecutor` threads, can access this getter concurrently. If two
threads access the getter when `persistingExecutorHolder` is `null`, both may proceed to
create a new `ScheduledExecutorService` instance. Due to the race condition, only one
instance will be stored in `persistingExecutorHolder`, while the other will be abandoned
without being shut down. This results in a thread leak, which can degrade application
performance and lead to resource exhaustion over time. The backing field
`persistingExecutorHolder` also lacks a `@Volatile` annotation, creating potential
memory visibility issues.
Did we get this right? 👍 / 👎 to inform future reviews.
romtsn
left a comment
There was a problem hiding this comment.
@tsushanth thanks for your contribution, that's very much appreciated! I'm thinking if there's potentially a better approach, which would be to pull persistingExecutor up to the ReplayIntegration level and then pass it as a ctor argument to the respective CaptureStrategy.
It could then have the same lifecycle as the replayExecutor (that is, shut it down inside ReplayIntegration.close()) and would survive multiple start/stop calls, potentially saving us the cost of creating a new executor every time we start a new recording.
Fixes #5564
Problem
Each
ReplayIntegration.start()constructs a freshSessionCaptureStrategyorBufferCaptureStrategy. Both inheritBaseCaptureStrategy, which owns apersistingExecutorthat was previously declared as a Kotlinlazydelegate.stop()resets the delegated propertiessegmentTimestampandcurrentReplayId. Their setters callrunInBackground, which — whenoptions.threadChecker.isMainThread()is true — accessespersistingExecutor, silently initialising the lazy.stop()never shuts the executor down, so oneSentryReplayPersister-*thread is abandoned on every cycle. Kiln benchmark data confirmed monotonically growing thread counts and a GC allocation rate roughly 9× baseline after the repeated start/stop pattern introduced in kiln#73.Fix
Replace the
lazydelegate with an explicit nullable holder (persistingExecutorHolder). The customget()mirrors the old lazy behaviour (create-on-first-access) while making initialisation detectable. At the end ofstop(), if the holder is non-null the executor is shut down viashutdownNow()— non-blocking, safe to call on the main thread — and the holder is cleared so a subsequentstart()gets a fresh executor without any residual state.ReplayExecutorService.shutdown()is intentionally not used here: it blocks foroptions.shutdownTimeoutMillisand risks an ANR when invoked on the main thread.shutdownNow()interrupts any in-flight persistence write and discards the queue, which is acceptable becausecache.close()is called immediately before in the samestop()body.Test
Added
stop shuts down persisting executor so no SentryReplayPersister threads leak across cyclestoSessionCaptureStrategyTest. The test stubsthreadChecker.isMainThread()totrue(forcing the persisting executor path), runs three start/stop cycles, and asserts that noSentryReplayPersister-*threads remain alive after a short drain window.