Add fastcdc chunker (keyed Gear hash) by ThomasWaldmann · Pull Request #9824 · borgbackup/borg

ThomasWaldmann · 2026-06-27T22:45:45Z

New `fastcdc` chunker (keyed Gear hash)

A FastCDC content-defined chunker using the window-less Gear rolling hash
(fp = (fp << 1) + Gear[byte]), which is cheaper per byte than buzhash's
cyclic-polynomial update, so it chunks noticeably faster while producing the same
chunk-size distribution and deduplication.

The Gear table is keyed: derived from the repo id key via CSPRNG (own
fastcdc domain), exactly like the buzhash64 table, so chunk cut points stay
unpredictable without the key (anti-fingerprinting). It implements the same
FastCDC techniques as buzhash64 (sub-minimum skipping, normalized chunking with a
required nc_level, min/max clamping); the mask uses the high bits of the hash.

chunker-params: fastcdc,chunk_min,chunk_max,chunk_mask,nc_level — no window
field, because Gear is window-less. E.g. fastcdc,19,23,21,2.

borg benchmark cpu now measures the fastcdc chunker; tests live in
borg.testsuite.chunkers (golden vector, size distribution, keyed gear table,
param parsing, slow fuzz); docs and changelog updated.

Benchmarks

scripts/chunker_bench.py, buzhash64 vs fastcdc, both nc_level=2, incompressible
data unless noted:

corpus / target	metric	buzhash64	fastcdc
5 GiB, 2 MiB target	CV	0.294	0.295
	throughput	1011 MB/s	1313 MB/s (+30%)
64 MiB, 64 KiB target	CV	0.374	0.359
	shift-resilience	0.9928	0.9929
	throughput	963 MB/s	1331 MB/s (+38%)
2.5 GiB re-backup, 64 edits	dedup (lower=better)	0.5237	0.5236
2.5 GiB re-backup, 320 edits	dedup	0.6133	0.6161

borg benchmark cpu, 1 GB: fastcdc 3.80s, buzhash 4.36s, buzhash64 8.13s, fixed 0.56s.

Chunk-size distribution, deduplication and shift-resilience match buzhash64 within
noise; fastcdc is consistently faster.

🤖 Generated with Claude Code

codecov · 2026-06-27T23:24:39Z

Codecov Report

❌ Patch coverage is 86.66667% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.17%. Comparing base (dbadb32) to head (831e47d).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/borg/helpers/parseformat.py	81.81%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9824      +/-   ##
==========================================
- Coverage   85.17%   85.17%   -0.01%     
==========================================
  Files          93       93              
  Lines       15346    15372      +26     
  Branches     2308     2318      +10     
==========================================
+ Hits        13071    13093      +22     
- Misses       1581     1583       +2     
- Partials      694      696       +2

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Add a new "fastcdc" content-defined chunker selectable via --chunker-params. It uses the FastCDC Gear rolling hash (fp = (fp << 1) + Gear[byte]), which is window-less and cheaper per byte than buzhash's cyclic-polynomial update, so it chunks noticeably faster (see "borg benchmark cpu" output), while producing the same chunk-size distribution and deduplication. The Gear table is keyed: it is derived from the repo id key via CSPRNG (own "fastcdc" domain), exactly like the buzhash64 table, so chunk cut points stay unpredictable without the key (anti-fingerprinting). It implements the same FastCDC techniques as buzhash64 (sub-minimum skipping, normalized chunking with a required nc_level, min/max clamping); the mask uses the high bits of the hash (Gear accumulates entropy there). chunker-params: "fastcdc,chunk_min,chunk_max,chunk_mask,nc_level" - there is no window field, because Gear is window-less. e.g. fastcdc,19,23,21,2 Also: borg benchmark cpu now measures the fastcdc chunker; tests in borg.testsuite.chunkers (golden vector, size distribution, keyed gear table, param parsing, slow fuzz); docs and changelog. Benchmarks (scripts/chunker_bench.py, buzhash64 vs fastcdc, both nc_level=2, incompressible data unless noted): 5 GiB, 2 MiB target (default params): buzhash64: CV 0.294, 1011 MB/s fastcdc: CV 0.295, 1313 MB/s (+30%) 64 MiB, 64 KiB target: buzhash64: CV 0.374, shift-resilience 0.9928, 963 MB/s fastcdc: CV 0.359, shift-resilience 0.9929, 1331 MB/s (+38%) Re-backup of a 2.5 GiB file after scattered single-byte edits (dedup ratio, 0.5 = v2 fully deduplicated, lower is better): 64 edits: buzhash64 0.5237, fastcdc 0.5236 320 edits: buzhash64 0.6133, fastcdc 0.6161 borg benchmark cpu, 1 GB: fastcdc 3.80s, buzhash 4.36s, buzhash64 8.13s, fixed 0.56s. Chunk-size distribution, deduplication and shift-resilience match buzhash64 within noise; fastcdc is consistently faster. Also: fix bug when computing the mask, one needs to use 1ULL instead of 1, so the shifting computation is done in a uint64, not in a 32bit int. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

For buzhash64 and fastcdc, chunker params like chunk_min == chunk_max passed CLI validation, but then failed with an AssertionError when the chunker was constructed (buzhash64 needs window_size + 2^chunk_min + 1 <= 2^chunk_max, fastcdc needs 2^chunk_min + 1 <= 2^chunk_max). Check this at parse time and raise a proper ArgumentTypeError instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…mode A wrong field count for a named algo (e.g. fastcdc,19,23,21 or buzhash,19,23,21 with a forgotten field) used to fall through to the old-style buzhash compat branch, which then crashed with a ValueError from int('fastcdc') instead of raising a proper ArgumentTypeError. - restrict the old-style compat branch to what it is for: four all-numeric fields without an algorithm name (19,23,21,4095 still works). - for buzhash64 and fastcdc, a wrong field count now raises an ArgumentTypeError that spells out the expected format, which helps since buzhash64 recently gained nc_level and fastcdc has no window field. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

fill() is declared 'except 0' and always returns 1: errors propagate as exceptions, so the 'if not self.fill(): return None' branches were unreachable (and process() returning None would have crashed __next__ anyway). Call fill() plainly instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ThomasWaldmann changed the title ~~Add fastcdc chunker (keyed Gear hash); buzhash64 normalized chunking~~ Add fastcdc chunker (keyed Gear hash) Jun 27, 2026

ThomasWaldmann force-pushed the fastcdc-chunker branch 2 times, most recently from c16e0fe to f41a414 Compare June 27, 2026 22:57

ThomasWaldmann force-pushed the fastcdc-chunker branch from f41a414 to afa8189 Compare June 28, 2026 10:41

ThomasWaldmann mentioned this pull request Jun 29, 2026

chunking algorithms #5721

Open

ThomasWaldmann force-pushed the fastcdc-chunker branch 2 times, most recently from 666b542 to 7c47591 Compare July 1, 2026 21:12

ThomasWaldmann and others added 5 commits July 1, 2026 23:24

buzhash64: update type stub for nc_level/normal_size

831e47d

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ThomasWaldmann force-pushed the fastcdc-chunker branch from 7c47591 to 831e47d Compare July 1, 2026 21:27

ThomasWaldmann merged commit c944ace into borgbackup:master Jul 2, 2026
19 checks passed

ThomasWaldmann deleted the fastcdc-chunker branch July 2, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add fastcdc chunker (keyed Gear hash)#9824

Add fastcdc chunker (keyed Gear hash)#9824
ThomasWaldmann merged 5 commits into
borgbackup:masterfrom
ThomasWaldmann:fastcdc-chunker

ThomasWaldmann commented Jun 27, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

ThomasWaldmann commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New fastcdc chunker (keyed Gear hash)

Benchmarks

Uh oh!

codecov Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ThomasWaldmann commented Jun 27, 2026 •

edited

Loading

New `fastcdc` chunker (keyed Gear hash)

codecov Bot commented Jun 27, 2026 •

edited

Loading