Skip to content

Make mergeCollectionWithPatches cache-first#787

Merged
Gonals merged 2 commits into
Expensify:mainfrom
callstack-internal:elirangoshen/fix/90634-mergeCollectionWithPatches-cache-first
May 22, 2026
Merged

Make mergeCollectionWithPatches cache-first#787
Gonals merged 2 commits into
Expensify:mainfrom
callstack-internal:elirangoshen/fix/90634-mergeCollectionWithPatches-cache-first

Conversation

@elirangoshen
Copy link
Copy Markdown
Contributor

@elirangoshen elirangoshen commented May 19, 2026

Details

mergeCollectionWithPatches was the only Onyx write method that did not follow the cache-first / storage-second invariant. Every other write method (setWithRetry, applyMerge, setCollectionWithRetry, partialSetCollection, clear) updates the in-memory cache synchronously before issuing a storage write — so when storage fails (e.g., the IDB corruption population analyzed in Expensify/App#87862), the cache is already correct and subscribers still see the right data.

mergeCollectionWithPatches instead pushed Storage.multiMerge / Storage.multiSet first and bundled the cache.merge + keysChanged update into a previousCollectionPromise.then(...) chain that ran in parallel with storage. In the warm-cache common case the cache update happened to win the race, but if previousCollectionPromise lost the race (cache miss falling back to a slow storage read), subscribers could observe a state inconsistent with what the merge logically applied.

This change restores the invariant:

  1. Promise.all(existingKeys.map((key) => get(key))) runs first to pre-warm the cache from storage for any cache-miss existing keys. get() is a sync-resolved no-op for cache hits, so this is microtask-cheap for the warm-cache common case. The pre-warm is required because cache.merge's fastMerge needs the previous storage value as the merge base — without it, cold-cache merges would start from undefined and silently drop existing data (the behavior the old previousCollectionPromise happened to provide as a side effect via get()'s cache-population path).
  2. cache.merge(finalMergedCollection) and keysChanged(...) then run synchronously on the now-warm cache, before the storage promises are issued.
  3. Storage.multiMerge / Storage.multiSet are pushed afterwards. The retryOperation / DevTools chain is unchanged.

Result: subscribers receive the merged data before any storage write is attempted, and storage failure no longer races the cache update — matching the behavior of every other Onyx write method.

Related Issues

Expensify/App#90634

Companion PR

Expensify/App#91160 — bumps the App's react-native-onyx pin to this branch's head for end-to-end verification.

Automated Tests

Added a new regression test in `tests/unit/onyxUtilsTest.ts`:

`mergeCollection cache-first ordering > updates cache and notifies subscribers even when Storage.multiMerge rejects`

It seeds an existing collection member, mocks `Storage.multiMerge` to reject with a non-retriable IDB backing-store error, and asserts both the cache (via `OnyxCache.getCollectionData`) and a connected collection subscriber receive the merged values regardless of the storage rejection.

All 452 existing unit tests still pass — including the timing-sensitive `mergeCollection` / `Onyx.update` callback-ordering tests that probe `keysChanged` and `retryOperation` semantics.

Manual Tests

The companion App PR (Expensify/App#91160) pulls this branch in via package.json + lockfile, so the real end-to-end exercises happen against a running App session. Full step-by-step lives in that PR's Tests section; the summary below covers the same scenarios.

mergeCollectionWithPatches is reached from these App entry points: Search.ts, IOU/Hold.ts, IOU/MoneyRequest.ts, Report/MarkAllMessageAsRead.tsx, Report/index.ts, Policy/Policy.ts, plus every Onyx.update batch that contains a MERGE_COLLECTION op (LHN refresh, Pusher events, OpenApp response, etc.).

Setup

  1. In the App repo, check out elirangoshen/fix/90634-mergeCollectionWithPatches-cache-first (Make mergeCollectionWithPatches cache-first [No QA] App#91160), which pins react-native-onyx to this branch's head
  2. npm install under Node 20.20.0, then npm run web
  3. Open https://dev.new.expensify.com:8082/ in Chrome with DevTools open
  4. Sign in to a test account

Functional smoke (no regression in mergeCollection-driven flows)

  1. Initial hydration — after login, LHN reports list and workspace switcher populate within a few seconds. No missing rows vs. a baseline session on main.
  2. Chat message — open a chat, send a message. Appears immediately (optimistic merge into reportActions_), confirms via Pusher, persists after reload.
  3. Mark all as read — click the mark-all-as-read action. All unread badges clear immediately and stay cleared after reload.
  4. Search & filter — open Search, apply a filter. Results populate within a few seconds; live updates as filters change; no duplicates.
  5. Hold / unhold an expense — toggle hold on an expense in a report. Badge appears/clears immediately; persists after reload.
  6. Submit expense — FAB → Submit expense. Expense appears in the report immediately, no duplicate, persists after reload.
  7. Switch workspaces — switch via the workspace switcher. LHN reports filter to the new workspace within a few seconds.

Storage-failure simulation — the core regression guard for this PR

The fix protects against a race where a failing Storage.multiMerge could leave the in-memory cache without the merge's update (and therefore leave subscribers stale). With the fix in place, cache.merge and keysChanged always fire before the storage write, so subscribers stay correct regardless of storage outcome.

  1. With the App running and authenticated, open Chrome DevTools → Application tab → IndexedDB
  2. Find the database used by Onyx (typically named OnyxDB)
  3. Right-click → Delete database while the App is still running and connected (do not reload)
  4. Immediately trigger a mergeCollection-driven action — switch workspaces, send a chat message, or apply a search filter
  5. Expected with this fix: the UI updates correctly; the new state is visible to subscribers even though the underlying IDB write fails. Console will show storage error logs, but no white screen, no stale UI, no data loss within the session.
  6. (Optional regression check) Reverting just this PR's commit locally and re-running step 4 should produce timing-sensitive misses where the UI fails to reflect the action.

Evidence

Screen recording of the smoke tests (1–7), the IDB-failure simulation, and the cold-cache test will be attached to the App PR (Expensify/App#91160) — since this PR has no UI surface of its own, all video evidence lives there.

Author Checklist

  • I linked the correct issue in the ### Related Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android / native
    • Android / Chrome
    • iOS / native
    • iOS / Safari
    • MacOS / Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that the left part of a conditional rendering a React component is a boolean and NOT a string, e.g. myBool && <MyComponent />.
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.js or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • If we are not using the full Onyx data that we loaded, I've added the proper selector in order to ensure the component only re-renders when the data it is using changes
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR author checklist, including those that don't apply to this PR.

Screenshots/Videos

Screen.Recording.2026-05-20.at.14.53.27.mov
Screen.Recording.2026-05-20.at.14.54.02.mov
Screen.Recording.2026-05-20.at.15.00.50.mov
Screen.Recording.2026-05-20.at.15.06.56.mov
Screen.Recording.2026-05-20.at.15.10.33.mov
Screen.Recording.2026-05-20.at.15.12.19.mov
Screen.Recording.2026-05-20.at.15.14.31.mov
Screen.Recording.2026-05-20.at.15.16.13.mov
Screen.Recording.2026-05-20.at.15.16.36.mov
Screen.Recording.2026-05-20.at.15.17.07.mov
Screen.Recording.2026-05-20.at.15.19.23.mov
Screen.Recording.2026-05-20.at.15.27.05.mov

Update cache and notify subscribers synchronously before issuing
storage writes, matching the cache-first / storage-second invariant
followed by every other Onyx write method. Subscribers now still
reflect the merged data when the underlying storage write fails (e.g.
IDB corruption).

`get()` is invoked on existingKeys first to pre-warm cache from
storage for cache-miss keys; it is a no-op (sync-resolved) for cache
hits. This preserves the previous cold-cache merge semantics while
removing the prior race between the cache update and the storage
write.

Adds a regression test asserting cache + subscribers reflect the
merged collection when Storage.multiMerge rejects.

Fixes Expensify/App#90634
@elirangoshen elirangoshen marked this pull request as ready for review May 20, 2026 08:25
@elirangoshen elirangoshen requested a review from a team as a code owner May 20, 2026 08:25
@melvin-bot melvin-bot Bot requested review from Gonals and removed request for a team May 20, 2026 08:25
@Gonals
Copy link
Copy Markdown
Contributor

Gonals commented May 20, 2026

Seems like the checklist is incomplete

@fabioh8010
Copy link
Copy Markdown
Contributor

fabioh8010 commented May 20, 2026

@elirangoshen Onyx and E/App PR are lacking manual testing steps and evidence
Also hold it for my review first

@elirangoshen elirangoshen changed the title Make mergeCollectionWithPatches cache-first [HOLD] Make mergeCollectionWithPatches cache-first May 20, 2026
@elirangoshen
Copy link
Copy Markdown
Contributor Author

@elirangoshen Onyx and E/App PR are lacking manual testing steps and evidence Also hold it for my review first

Hi, I fixed it, let me know how it is

@mountiny mountiny requested review from dmkt9 and mountiny May 20, 2026 14:30
Comment thread lib/OnyxUtils.ts
// value back to cache. This is required so the subsequent cache.merge() merges the new delta
// into the real previous storage value (rather than starting from `undefined` and dropping
// the existing keys).
return Promise.all(existingKeys.map((key) => get(key))).then(() => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Promise.all(existingKeys.map((key) => get(key))).then(() => {
return multiGet(existingKeys).then(() => {

NAB but something to explore in the future or follow-up – Use multiGet to return all keys (just one trip to cache/DB) instead of doing N get()s. This should have beneficial performance gains but it made 2 tests failed because now some Onyx.connect fired with empty state before what we expected – it seems because we have more microtasks in the chain and consequently a minor delay that make this happen.

I suggest we don't include this change here, but it's something to explore

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will create an internal ticket for it

});

describe('mergeCollection cache-first ordering', () => {
it('updates cache and notifies subscribers even when Storage.multiMerge rejects', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test for Storage.multiSet failure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@elirangoshen elirangoshen changed the title [HOLD] Make mergeCollectionWithPatches cache-first Make mergeCollectionWithPatches cache-first May 21, 2026
Adds a parallel test inside the 'mergeCollection cache-first ordering'
describe asserting that cache + subscribers reflect the merged data
even when Storage.multiSet (the new-keys path) rejects. The existing
test only exercises the Storage.multiMerge (existing-keys) path.

Also adds an afterEach that restores StorageMock.multiMerge and
StorageMock.multiSet to their originals after each test in this
block, so the rejecting mocks no longer leak into later describe
blocks (eviction, afterInit) whose setup relies on these storage
methods working normally.

Addresses review feedback on Expensify#787.
@elirangoshen elirangoshen requested a review from fabioh8010 May 21, 2026 09:52
Comment thread lib/OnyxUtils.ts
// ensuring subscribers still reflect the merged data even if the subsequent storage
// write fails.
const previousCollection = getCachedCollection(collectionKey, existingKeys);
cache.merge(finalMergedCollection);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always merge finalMergedCollection into the cache; if mergeCollectionWithPatches fails on the first run, all keys will be considered as existingKeys in the next attempt. This causes the Storage.multiMerge function to be called instead of Storage.multiSet for new keys.

Copy link
Copy Markdown
Contributor

@fabioh8010 fabioh8010 May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catches @dmkt9. I was analysing and actually both pre-exist in main though, just less reliably (depends on whether promiseUpdate resolves before Promise.all(promises) rejects).

#2 is mostly handled by structural sharing + the === dedup in keysChanged. Only waitForCollectionCallback: true re-fires, and that one fires on every change by contract anyway.

#1 is real but benign — multiMerge on a missing key just stores the patch (works like a set in that case), so end state is the same.

I suggest we address this issue as a follow-up as same pattern affects multiSetWithRetry / setCollectionWithRetry / partialSetCollection and possibly all the functions that use retryOperation.

cc @elirangoshen

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we address this issue as a follow-up as same pattern affects multiSetWithRetry / setCollectionWithRetry / partialSetCollection and possibly all the functions that use retryOperation.

Yes, that makes sense to me too.

Comment thread lib/OnyxUtils.ts
// write fails.
const previousCollection = getCachedCollection(collectionKey, existingKeys);
cache.merge(finalMergedCollection);
keysChanged(collectionKey, finalMergedCollection, previousCollection);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, we also call keysChanged for each attempt after mergeCollectionWithPatches failed. This also causes Onyx.connect with waitForCollectionCallback: true to potentially be called more than once with the same payload.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmkt9
Copy link
Copy Markdown

dmkt9 commented May 21, 2026

Reviewer Checklist

  • I have verified the author checklist is complete (all boxes are checked off).
  • I verified the correct issue is linked in the ### Fixed Issues section above
  • I verified testing steps are clear and they cover the changes made in this PR
    • I verified the steps for local testing are in the Tests section
    • I verified the steps for Staging and/or Production testing are in the QA steps section
    • I verified the steps cover any possible failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
  • I checked that screenshots or videos are included for tests on all platforms
  • I included screenshots or videos for tests on all platforms
  • I verified that the composer does not automatically focus or open the keyboard on mobile unless explicitly intended. This includes checking that returning the app from the background does not unexpectedly open the keyboard.
  • I verified tests pass on all platforms & I tested again on:
    • Android: HybridApp
    • Android: mWeb Chrome
    • iOS: HybridApp
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
  • If there are any errors in the console that are unrelated to this PR, I either fixed them (preferred) or linked to where I reported them in Slack
  • I verified proper code patterns were followed (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick).
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I verified that this PR follows the guidelines as stated in the Review Guidelines
  • I verified other components that can be impacted by these changes have been tested, and I retested again (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar have been tested & I retested again)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG)
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • For any bug fix or new feature in this PR, I verified that sufficient unit tests are included to prevent regressions in this flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR reviewer checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: HybridApp
Android: mWeb Chrome
iOS: HybridApp
iOS: mWeb Safari
MacOS: Chrome / Safari
chrome.mp4

@Gonals Gonals merged commit f67b94b into Expensify:main May 22, 2026
11 checks passed
@os-botify
Copy link
Copy Markdown
Contributor

os-botify Bot commented May 22, 2026

🚀 Published to npm in 3.0.78 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants