Skip to content

Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715)#8776

Draft
c-herz wants to merge 1 commit into
borgbackup:masterfrom
c-herz:datefilter
Draft

Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715)#8776
c-herz wants to merge 1 commit into
borgbackup:masterfrom
c-herz:datefilter

Conversation

@c-herz

@c-herz c-herz commented Apr 19, 2025

Copy link
Copy Markdown

This PR adds preliminary support for matching ISO 8601 timestamps with the date: archive filter, and intends to begin addressing the requirements of #8715.

~~Timestamps (except for Unix epoch forms) are currently assumed to be in the user's local timezone and converted to UTC internally. The following formats are currently supported:

  1. YYYY
  2. YYYY-MM
  3. YYYY-MM-DD
  4. YYYY-MM-DDTHH -> matches a 1-hour interval
  5. YYYY-MM-DDTHH:MM -> matches a 1-minute interval
  6. YYYY-MM-DDTHH:MM:SS -> matches a 1-second interval
  7. YYYY-MM-DDTHH:MM:SS.ffff -> matches an exact timestamp, including up to microseconds
  8. @123456789 -> Unix epoch (interpreted as UTC)

Edit: Scoped down significantly from original PR; future improvements can be made in future PRs.

@c-herz c-herz changed the title Add preliminary support for ISO-8601 timestamps via date: archive match pattern Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715) Apr 19, 2025
@PhrozenByte

PhrozenByte commented Apr 20, 2025

Copy link
Copy Markdown
Contributor

Thanks for picking this up! ❤️

  1. YYYY-MM-DDTHH:MM:SS.ffff -> matches an exact timestamp, including fractional seconds
  2. @123456789 -> Unix epoch (interpreted as UTC)

(Is the fractional-second exact match useful in practice? Feedback welcome on this.)

What's the precision of an archive's creation time? From the code I assume it's with fractional seconds, right? I absolutely agree with you then: There should be a variant that is guaranteed to match a single archive. I feel like that Unix timestamps should also optionally support fractional seconds then.

  1. YYYY-MM-DDTHH -> matches a 1-hour interval

From reading the code I derive that the 1-hour interval is matched exclusively, i.e. it's actually matching any archive within 00:59:59.9999… hours, correct? Perfect 👍

# Year/Year-month/Year-month-day
parts = expr.split("-")
try:
    if len(parts) == 1:                    # YYYY
        year = int(parts[0])

Even though I like the simplicity, I feel like that Borg should be pretty strict about the format, because being less strict easily leads to ambiguity. For example, is date:0 supposed to match any archive created in year 0? Probably, but it gets way less clear with (now deprecated) truncated ISO8601 dates: What does the pattern date:25-1 describe? January of the year 25, January 1925, or January 2025?

If there's no library that can be used, I always imagined that the code would basically revolve around a single, rather strict regex with bottom-up optional groups for year, month, day, hour, minutes, seconds, and fractal seconds, or * as wildcard, supplemented by another regex to match periods, and simple matchers for Unix timestamps and keywords. I'm not saying that this is the best approach, that's just what I imagined while writing #8715.

In general I like to encourage creating extensive unit tests as early as possible. It's elegant and simple code now (🚀 👍), but complexity will increase greatly when adding more and more features.

Note: I can read the code, but can't do an actual code review - for that I just don't known enough of Borg's code.

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Some minor stuff I found...

Comment thread src/borg/helpers/time.py Outdated
Comment thread src/borg/helpers/time.py Outdated
Comment thread src/borg/helpers/time.py Outdated
@ThomasWaldmann

ThomasWaldmann commented Apr 21, 2025

Copy link
Copy Markdown
Member

BTW, if you install the pre-commit hook, you can have your commits automatically formatted.

https://borgbackup.readthedocs.io/en/stable/development.html#building-a-development-environment

Comment thread src/borg/helpers/time.py Outdated
Comment thread src/borg/helpers/time.py Outdated
(?:
:(?P<minute>\d{2}|\*) # minute (MM or *)
(?:
:(?P<second>\d{2}(?:\.\d+)?|\*) # second (SS or SS.fff or *)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, that is a nice regex now. :-)

You could also deal with fractional seconds as with all other components: it is a optional component, so you can also match it with a named group and later check the groupdict.

Comment thread src/borg/helpers/time.py Outdated
Comment on lines +306 to +312
r"(?:(?P<years>\d+)Y)?"
r"(?:(?P<months>\d+)M)?"
r"(?:(?P<weeks>\d+)W)?"
r"(?:(?P<days>\d+)D)?"
r"(?:(?P<hours>\d+)h)?"
r"(?:(?P<minutes>\d+)m)?"
r"(?:(?P<seconds>\d+)s)?"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't we use to have lowercase ymwd and uppercase HMS?

@PhrozenByte PhrozenByte Apr 25, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c-herz, is that a custom format, or did I miss an ISO 8601 or RFC update? I know ISO-8601's P3Y6M4DT12H30M5S (P designator, 3 years, 6 months, 4 days, T time separator, 12 hours, 30 minutes, 5 seconds, all being optional) format. I honestly don't like that "official" format very much (it's so cumbersome…), especially in regards to using M for both months and minutes (it's still unambiguous due to the T separator), but I feel like we should support it, because it's an official part of ISO 8601.

However, I kinda like the idea of additionally supporting our own format. Like, why not support 12:34:56 (or similar, just a quick idea) to specify a 12 hours, 34 minutes, 56 seconds duration? Why not also support (space) as alternative to T to separate times? The designators could ignore case (i.e. also supporting 7d for 7 days), and we could allow a space after each term. This might go too far though. In general, I'm not sure whether we should put things into the "official" P designator, or rather additionally support our own (like the suggested D, e.g. D 3y 6m 4d 12:30:05?). Is there maybe another common or even formalized/standardized (like another RFC; I did some research, unfortunately I didn't find anything official or quasi-official) format? WDYT?

Comment thread src/borg/helpers/time.py Outdated
# ISO week date: YYYY-Www or YYYY-Www-D
(?P<isoweek_year>\d{4})-W(?P<isoweek_week>\d{2})(?:-(?P<isoweek_day>\d))?
| # Ordinal date: YYYY-DDD
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit silly, but it just got me thinking: Could we support ordinal days with wildcard years? 🤔 Same for ISO weeks, possibly even *-W*-5 (wow, that looks crazy 😆) to match all archives created on a Friday? Not sure whether users would use it, but if it's possible? WDYT?

Comment thread src/borg/helpers/time.py Outdated
| # Ordinal date: YYYY-DDD
(?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3})
| # Unix epoch
@(?P<epoch>\d+)

@PhrozenByte PhrozenByte Apr 25, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also supporting fractal seconds here would be amazing! ❤️

Side note: I might read the regex wrong, but this also means @1745577106[Europe/Berlin] (or any other TZ format) is supported? AFAIK Unix timestamps are UTC per definition, right? Or is the TZ info used later for something else?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fractional :-)

Comment thread src/borg/helpers/time.py Outdated
Also supports:
TIMESTAMP/TIMESTAMP
TIMESTAMP/DURATION
DURATION/TIMESTAMP.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! ❤️

Just DURATION (i.e. without a TIMESTAMP) isn't supported yet, right? Just specifying a duration is helpful to match the latest archives relative to "now" (which could be another useful keyword).

@codecov

codecov Bot commented Apr 25, 2025

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2162 2 2160 496
View the top 2 failed test(s) by shortest run time
src.borg.testsuite.archiver.prune_cmd_test::test_prune_repository_example[archiver]
Stack Traces | 4.24s run time
archivers = 'archiver'
request = <FixtureRequest for <Function test_prune_repository_example[archiver]>>
backup_files = '.../pytest-0/popen-gw1/backup0'

    def test_prune_repository_example(archivers, request, backup_files):
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        # Archives that will be kept, per the example
        # Oldest archive
        _create_archive_ts(archiver, backup_files, "test01", 2015, 1, 1)
        # 6 monthly archives
        _create_archive_ts(archiver, backup_files, "test02", 2015, 6, 30)
        _create_archive_ts(archiver, backup_files, "test03", 2015, 7, 31)
        _create_archive_ts(archiver, backup_files, "test04", 2015, 8, 31)
        _create_archive_ts(archiver, backup_files, "test05", 2015, 9, 30)
        _create_archive_ts(archiver, backup_files, "test06", 2015, 10, 31)
        _create_archive_ts(archiver, backup_files, "test07", 2015, 11, 30)
        # 14 daily archives
        _create_archive_ts(archiver, backup_files, "test08", 2015, 12, 17)
        _create_archive_ts(archiver, backup_files, "test09", 2015, 12, 18)
        _create_archive_ts(archiver, backup_files, "test10", 2015, 12, 20)
        _create_archive_ts(archiver, backup_files, "test11", 2015, 12, 21)
        _create_archive_ts(archiver, backup_files, "test12", 2015, 12, 22)
        _create_archive_ts(archiver, backup_files, "test13", 2015, 12, 23)
        _create_archive_ts(archiver, backup_files, "test14", 2015, 12, 24)
        _create_archive_ts(archiver, backup_files, "test15", 2015, 12, 25)
        _create_archive_ts(archiver, backup_files, "test16", 2015, 12, 26)
        _create_archive_ts(archiver, backup_files, "test17", 2015, 12, 27)
        _create_archive_ts(archiver, backup_files, "test18", 2015, 12, 28)
        _create_archive_ts(archiver, backup_files, "test19", 2015, 12, 29)
        _create_archive_ts(archiver, backup_files, "test20", 2015, 12, 30)
        _create_archive_ts(archiver, backup_files, "test21", 2015, 12, 31)
        # Additional archives that would be pruned
        # The second backup of the year
        _create_archive_ts(archiver, backup_files, "test22", 2015, 1, 2)
        # The next older monthly backup
        _create_archive_ts(archiver, backup_files, "test23", 2015, 5, 31)
        # The next older daily backup
        _create_archive_ts(archiver, backup_files, "test24", 2015, 12, 16)
        output = cmd(archiver, "prune", "--list", "--dry-run", "--keep-daily=14", "--keep-monthly=6", "--keep-yearly=1")
        # Prune second backup of the year
        assert re.search(r"Would prune:\s+test22", output)
        # Prune next older monthly and daily backups
        assert re.search(r"Would prune:\s+test23", output)
        assert re.search(r"Would prune:\s+test24", output)
        # Must keep the other 21 backups
        # Yearly is kept as oldest archive
>       assert re.search(r"Keeping archive \(rule: yearly\[oldest\] #1\):\s+test01", output)
E       AssertionError: assert None
E        +  where None = <function search at 0x7f2119df25c0>('Keeping archive \\(rule: yearly\\[oldest\\] #1\\):\\s+test01', 'Keeping archive (rule: daily #1):            test21                               Wed, 2015-12-30 19:00:00 -0500 [6a15b0e90fd670005297cf78665c9af298303077768da686142590a2f1e8d3ce]\nKeeping archive (rule: daily #2):            test20                               Tue, 2015-12-29 19:00:00 -0500 [25fabec09c98dd5fa7a68195f252547ef5019fe4cc5ff1b8a9775683159016cf]\nKeeping archive (rule: daily #3):            test19                               Mon, 2015-12-28 19:00:00 -0500 [17e1e55c8b328e331503026938113eb5c6ec9ba8aa405b4cd4f26a174262a0a4]\nKeeping archive (rule: daily #4):            test18                               Sun, 2015-12-27 19:00:00 -0500 [e2fc8750eb803c4ccc76f4e8103a35c2e40ca2ab22f301a07f36871974d914f5]\nKeeping archive (rule: daily #5):            test17                               Sat, 2015-12-26 19:00:00 -0500 [1af3d742d05305b47ca6c66851c61625835bbaee452e2edaa7dff1048a6fd1df]\nKeeping archive (rule: daily #6):            test16                               Fri, 2015-12-25 19:00:00 -0500 [b90cd3e72820e2992f8ceb1942f8f6d58d2847101fbb8488513149d7bd1ac5cd]\nKeeping archive (rule: daily #7):            test15                               Thu, 2015-12-24 19:00:00 -050...             Tue, 2015-09-29 20:00:00 -0400 [145f0b837aafd74051b700f10fde89ace0613aaaa131309ee3d9e70697f56a66]\nKeeping archive (rule: monthly #4):          test04                               Sun, 2015-08-30 20:00:00 -0400 [ddd3130cc615887106cf8ae44dc1d430a36732a37bedc1491d4d71ca2cdf289f]\nKeeping archive (rule: monthly #5):          test03                               Thu, 2015-07-30 20:00:00 -0400 [1d4adcfe64bcf784274b035aec5072289346119e5af33b28d312cb8b273b7fe0]\nKeeping archive (rule: monthly #6):          test02                               Mon, 2015-06-29 20:00:00 -0400 [d1e2efb8f725695c88568582a42d05bb0d5c9f847c8e2c9b82c46b378f568fdd]\nWould prune:                                 test23                               Sat, 2015-05-30 20:00:00 -0400 [2cafb63bfcc24a49c400bac441273744ab003177cf2ceddd2178bedd2bb54690]\nWould prune:                                 test22                               Thu, 2015-01-01 19:00:00 -0500 [24e6a84b725c050269687251d6b5bbff007d331ec7c9b3420190478824b09edb]\nKeeping archive (rule: yearly #1):           test01                               Wed, 2014-12-31 19:00:00 -0500 [4c67891f448c2cb9a331448f2555395ab344c96263fbc5dba5d50aedff322786]\n')
E        +    where <function search at 0x7f2119df25c0> = re.search

.../testsuite/archiver/prune_cmd_test.py:100: AssertionError
src.borg.testsuite.archiver.prune_cmd_test::test_prune_repository_example[remote_archiver]
Stack Traces | 38.9s run time
archivers = 'remote_archiver'
request = <FixtureRequest for <Function test_prune_repository_example[remote_archiver]>>
backup_files = '.../pytest-0/popen-gw1/backup0'

    def test_prune_repository_example(archivers, request, backup_files):
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        # Archives that will be kept, per the example
        # Oldest archive
        _create_archive_ts(archiver, backup_files, "test01", 2015, 1, 1)
        # 6 monthly archives
        _create_archive_ts(archiver, backup_files, "test02", 2015, 6, 30)
        _create_archive_ts(archiver, backup_files, "test03", 2015, 7, 31)
        _create_archive_ts(archiver, backup_files, "test04", 2015, 8, 31)
        _create_archive_ts(archiver, backup_files, "test05", 2015, 9, 30)
        _create_archive_ts(archiver, backup_files, "test06", 2015, 10, 31)
        _create_archive_ts(archiver, backup_files, "test07", 2015, 11, 30)
        # 14 daily archives
        _create_archive_ts(archiver, backup_files, "test08", 2015, 12, 17)
        _create_archive_ts(archiver, backup_files, "test09", 2015, 12, 18)
        _create_archive_ts(archiver, backup_files, "test10", 2015, 12, 20)
        _create_archive_ts(archiver, backup_files, "test11", 2015, 12, 21)
        _create_archive_ts(archiver, backup_files, "test12", 2015, 12, 22)
        _create_archive_ts(archiver, backup_files, "test13", 2015, 12, 23)
        _create_archive_ts(archiver, backup_files, "test14", 2015, 12, 24)
        _create_archive_ts(archiver, backup_files, "test15", 2015, 12, 25)
        _create_archive_ts(archiver, backup_files, "test16", 2015, 12, 26)
        _create_archive_ts(archiver, backup_files, "test17", 2015, 12, 27)
        _create_archive_ts(archiver, backup_files, "test18", 2015, 12, 28)
        _create_archive_ts(archiver, backup_files, "test19", 2015, 12, 29)
        _create_archive_ts(archiver, backup_files, "test20", 2015, 12, 30)
        _create_archive_ts(archiver, backup_files, "test21", 2015, 12, 31)
        # Additional archives that would be pruned
        # The second backup of the year
        _create_archive_ts(archiver, backup_files, "test22", 2015, 1, 2)
        # The next older monthly backup
        _create_archive_ts(archiver, backup_files, "test23", 2015, 5, 31)
        # The next older daily backup
        _create_archive_ts(archiver, backup_files, "test24", 2015, 12, 16)
        output = cmd(archiver, "prune", "--list", "--dry-run", "--keep-daily=14", "--keep-monthly=6", "--keep-yearly=1")
        # Prune second backup of the year
        assert re.search(r"Would prune:\s+test22", output)
        # Prune next older monthly and daily backups
        assert re.search(r"Would prune:\s+test23", output)
        assert re.search(r"Would prune:\s+test24", output)
        # Must keep the other 21 backups
        # Yearly is kept as oldest archive
>       assert re.search(r"Keeping archive \(rule: yearly\[oldest\] #1\):\s+test01", output)
E       AssertionError: assert None
E        +  where None = <function search at 0x7f2119df25c0>('Keeping archive \\(rule: yearly\\[oldest\\] #1\\):\\s+test01', 'Keeping archive (rule: daily #1):            test21                               Wed, 2015-12-30 19:00:00 -0500 [8415307bd7bcd9f62f16767475b03f033098b69f4d9edaa8601807f1b878cf80]\nKeeping archive (rule: daily #2):            test20                               Tue, 2015-12-29 19:00:00 -0500 [13a33ac9e9ad0049ec01c0c3c336fd4c841b407ed0341bb65b46332e467a027e]\nKeeping archive (rule: daily #3):            test19                               Mon, 2015-12-28 19:00:00 -0500 [a109f5a8f64fc813d9a23bee539f6c9821e9c9fca744166742ca0a8c6b3094b3]\nKeeping archive (rule: daily #4):            test18                               Sun, 2015-12-27 19:00:00 -0500 [ccdf2419a076601b320011296300f6dfea8e758c3e9777151b6c3bfa7326af88]\nKeeping archive (rule: daily #5):            test17                               Sat, 2015-12-26 19:00:00 -0500 [7b4aa113a03af0cc062cea17212cd3f3332aa4fe1ee4442d187ca0861ae95159]\nKeeping archive (rule: daily #6):            test16                               Fri, 2015-12-25 19:00:00 -0500 [7853f1a9422bbacc03cbaf765d7ed3221d23d78a6810dc9657bc8f98165d9db5]\nKeeping archive (rule: daily #7):            test15                               Thu, 2015-12-24 19:00:00 -050...             Tue, 2015-09-29 20:00:00 -0400 [67823e884a1887685d586d178f3b744e0de5b05875d9d32d64233a102ed5a562]\nKeeping archive (rule: monthly #4):          test04                               Sun, 2015-08-30 20:00:00 -0400 [4a1101a6ed74a593ca5adaeffc2173d6f770f5b4adbfbd227a229cf7ecabeaed]\nKeeping archive (rule: monthly #5):          test03                               Thu, 2015-07-30 20:00:00 -0400 [64844f72047f392934cd7fdb105b5595ab4291e21aece064e64e3330c1020947]\nKeeping archive (rule: monthly #6):          test02                               Mon, 2015-06-29 20:00:00 -0400 [294f635414ece4e933fcc7ec7e927a3191b1bdf44d15aca084fad549b5edc054]\nWould prune:                                 test23                               Sat, 2015-05-30 20:00:00 -0400 [3439c0263c84cebe2a496ffa988d6e018b922c6cc7449d1d466cd467c67b8077]\nWould prune:                                 test22                               Thu, 2015-01-01 19:00:00 -0500 [27cf9e63848a3cde34fa6cddf3bdb138909c78f6cf415c2eb9d5d82568b43a57]\nKeeping archive (rule: yearly #1):           test01                               Wed, 2014-12-31 19:00:00 -0500 [8195767798ceba4b32f1d8bda466a38571d32aa3da230687b19473ef5e11dfee]\n')
E        +    where <function search at 0x7f2119df25c0> = re.search

.../testsuite/archiver/prune_cmd_test.py:100: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@c-herz c-herz marked this pull request as ready for review April 26, 2025 00:45
@ThomasWaldmann

Copy link
Copy Markdown
Member

before doing further changes, please rebase on current master branch to get the cython workaround. otherwise, builds will fail.

@ThomasWaldmann

Copy link
Copy Markdown
Member

ping?

@c-herz

c-herz commented Jun 5, 2025

Copy link
Copy Markdown
Author

My apologies, I have been caught up with finals and starting a new internship. I should be able to look into @PhrozenByte's suggestions by this weekend. Apologies for the delays and the rather poor communication!

@ThomasWaldmann

Copy link
Copy Markdown
Member

ping? (no hurry, but if you find some time, it would be nice to finish this)

@ThomasWaldmann

Copy link
Copy Markdown
Member

Note: this PR can't even run the tests anymore, but it's not the fault of the code here, but current Cython requires some fixes not yet present here, but fixed in current master branch.

Please rebase on current master.

@ThomasWaldmann

Copy link
Copy Markdown
Member

Also, I would appreciate if we could get this into a mergable state ASAP.

We will soon have bigger changes and movements in the code, so there could be lots of merge conflicts coming.

@ThomasWaldmann

Copy link
Copy Markdown
Member

ping

@PhrozenByte

Copy link
Copy Markdown
Contributor

What do you think about reducing the scope of this PR? It might help move things forward and get the core functionality merged sooner.

A recent example is #8775, which was opened just one day before this PR and was successfully merged yesterday after its scope was reduced.

In particular, support for durations, from/to matching, and special keywords (e.g. "oldest") seems to be more involved than originally anticipated. Would it make sense to focus on date support first, perhaps with only very basic or no wildcard support, and leave the more complex parts for follow-up PRs?

@c-herz

c-herz commented Jun 22, 2026

Copy link
Copy Markdown
Author

Re: @PhrozenByte's suggestion--that definitely would make it a bit easier for me to get things up to speed after a year of neglect.

My sincere apologies for the radio silence on my end. Over the past year life really got in the way for me and I just didn't have time to work on hobby projects. However, I certainly should've communicated that better.

In any case, I should have some time this weekend and possibly here and there throughout the week to clean things up and at least get the basic feature I worked on last year rebased onto master and integrated with the major changes since then (which to be honest, I need to catch up on). I can shelve any nice-to-haves for now and focus on core functionality, and certainly keep everything as a reference if these are desired in the future and/or I have time to extend. Please let me know what should be the highest priority moving forward!

@PhrozenByte

Copy link
Copy Markdown
Contributor

There's really no need to apologize. We're all contributing in our spare time, and there are certainly more important things in life than getting a PR merged quickly 🙂

In my opinion, it should ultimately be up to you to decide which features you want to work on and where you want to draw the line for the initial implementation.

That said, since I originally suggested this in #8715, my personal motivation was simply being able to match archives by static dates. I create daily backups, and while the date used to be part of the archive name in Borg 1.x, archive series in Borg 2.0 work differently. Because of that, something as simple as a static match pattern like date:2026-06-22 would already cover my personal use case. Everything else suggested in #8715 mostly came from considering how the feature could be extended in useful ways. For example, the idea of even more coarse-grained dates such as date:2026-06 or date:2026-W26 came from considering users that create monthly or weekly backups.

But again, you're the one doing the actual work here, so feel free to draw the line wherever it makes sense to you, both in terms of implementation effort and what you actually feel like working on.

@ThomasWaldmann

Copy link
Copy Markdown
Member

Review done by Claude Opus 4.8

Overall

The implementation is clean and the integration into manifest.py mirrors the existing host:/user:/tags: dispatch correctly. Interval semantics (start-inclusive, end-exclusive) are right, overflow/invalid inputs are funneled into a clean CommandError rather than tracebacks, and test coverage is genuinely good (precision levels, tz equivalence, local-tz fallback, overflow, rejection cases). The earlier strictness concern is addressed — 4-digit years only, no truncated/ambiguous dates. The regex-with-named-groups design is exactly what was suggested in the earlier review rounds.

Blocking / process

  • Won't build or test as-is. The branch is ~584 commits behind master and predates the Cython workaround. It needs a rebase before CI can run. Reviewing against current master, the manifest.py/time.py integration points still line up cleanly, so the rebase should be mostly mechanical.

Substantive findings

  1. [Region/City] timezones depend on tzdata, which isn't a declared dependency. On Linux/macOS zoneinfo reads the system tz database, but on Windows there is none — ZoneInfo("America/Los_Angeles") raises ZoneInfoNotFoundError, which the code maps to "invalid timezone format". So named-zone patterns silently don't work for Windows users, and test_match_la_equivalents / test_match_utc_equivalents would fail on Windows binary builds. Fix: add tzdata to dependencies (e.g. ; sys_platform == 'win32'), or document the limitation.

  2. $ in the regex permits a trailing newline. date:2025\n parses as year 2025 (Python $ matches before a final \n). Using \Z instead of $ in DATE_PATTERN_RE hardens this. Low severity, but trivial to fix.

Minor / polish

  • No changelog entry in docs/changes.rst for this user-facing feature.
  • Out-of-range components (month=13, day=30 in Feb, hour=24) are accepted by the regex and only rejected later when datetime() raises ValueError. This is fine and caught cleanly — noted as by-design, not a hole.
  • compile_date_pattern calls expr.strip() although the input is already a stripped match suffix — harmless redundancy.
  • The help-text list mixes the @epoch entries between the calendar precisions, breaking the ascending-interval grouping.
  • For named-zone (ZoneInfo) patterns at day/month/year precision, interval-end arithmetic via timedelta is wall-clock-naive across DST. Irrelevant for sub-day precisions and an extreme edge case for the coarse ones; not worth fixing now.

Bottom line

Solid, mergeable-quality core once rebased. Suggested merge gates: (1) rebase onto current master, (2) the tzdata/Windows decision, (3) a changelog line. The \Z fix and help-text ordering are nice-to-haves.

@ThomasWaldmann

Copy link
Copy Markdown
Member

Follow-up by Claude Opus 4.8

The CI prune failures are caused by this PR's new test leaking TZ

The three failing tests are all in prune_cmd_test.py (test_prune_repository_example, test_prune_int_rolling_schedule_oldest_retention, test_prune_interval_rolling_schedule_oldest_retention) and look unrelated to date matching — but they are a side effect of the new test:

@pytest.mark.skipif(is_win32, reason="time.tzset() is not available on Windows")
def test_match_bare_pattern_uses_local_timezone(archivers, request, monkeypatch):
    archiver = request.getfixturevalue(archivers)
    monkeypatch.setenv("TZ", "America/New_York")
    time.tzset()   # pushes Eastern into libc
    ...

monkeypatch.setenv restores the TZ environment variable at teardown, but it never calls time.tzset() again, so libc keeps America/New_York active for every subsequent test in that worker process. This is the only place in src/borg/testsuite/ that touches the process timezone, so nothing reinstates the runner's default zone afterwards.

Why it breaks prune:

  • match_archives_date_test.py sorts before prune_cmd_test.py, so the leak is active when prune runs.
  • prune() buckets archives by local-time periods. With the leaked Eastern zone, a UTC 2024-01-01T00:00 archive becomes local 2023-12-31T19:00 and shifts into the previous month/year bucket, which throws off the oldest-retention boundary math — hence 01-01/02-01 instead of 01-31.
  • The tell is in the failure output: every prune timestamp prints -0500/-0400 (Eastern). On the runner's normal zone these tests pass.

So this is a test-isolation bug introduced here, not a prune regression.

Suggested fix

monkeypatch alone can't fix this cleanly (its env-restore finalizer races with any addfinalizer(time.tzset) ordering). Restore the zone explicitly so tzset() runs after TZ is put back:

def test_match_bare_pattern_uses_local_timezone(archivers, request):
    old_tz = os.environ.get("TZ")
    def restore_tz():
        if old_tz is None:
            os.environ.pop("TZ", None)
        else:
            os.environ["TZ"] = old_tz
        time.tzset()
    request.addfinalizer(restore_tz)

    os.environ["TZ"] = "America/New_York"
    time.tzset()
    archiver = request.getfixturevalue(archivers)
    ...

os and time are already imported in the test file. Factoring this into a small set_timezone fixture would be even nicer, since a tz-leak guard is generally reusable.

@PhrozenByte PhrozenByte left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an user's perspective: LGTM, great work! ❤️

Just three rather minor suggestions:

  1. WDYT about also allowing (space) as a date-time separator (e.g. date:2025-01-01 14:30)?
  2. It might be a good idea to add a test for Daylight Saving Time (DST) specifically. For example, date:2026-03-29T03:00[Europe/Berlin] will behave a little weird and different from date:2026-03-29T03:00+02:00. IMHO there's no need to do anything about it, I just think it might be a good idea to make the behaviour reproducible with a unit test and note to users that timezone quirks exist (IMHO it's not necessary to go into detail, just that timezones can be weird and that one might want to force UTC instead).
  3. I have a few suggestions concerning the help text. See comment below.

As always, I didn't do an actual code review, I'm reviewing this purely from an user's perspective by reading the code changes, unit tests, and docs.

I can do an actual test run once this has been rebased on latest master, but due to the pretty extensive test suite I don't think that I'll catch anything by that.

Full regular expression support.
This is very powerful, but can also get rather complicated.

Date patterns, selector ``date:``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, I found this a little hard to follow. Here's a suggestion for the help text:

        Date patterns, selector ``date:``
            Match archives by creation timestamp. You can either match a single archive by
            passing its exact creation time, or all archives created within a given time
            interval.

            To match a single archive by its exact creation time, use the forms:

            - ``YYYY-MM-DDTHH:MM:SS.ffffff``: ISO-8601-like date-time string
            - ``@1735732800.123456``: UNIX timestamp

            To match a single archive, the pattern must specify the archive's complete
            creation timestamp, including any fractional seconds. Fractional-second
            patterns accept 1 to 6 digits.

            To match all archives created within a given time interval, use the forms:

            - ``YYYY``: match all archives created within the given year
            - ``YYYY-MM``: within the given month
            - ``YYYY-MM-DD``: on the given day
            - ``YYYY-MM-DDTHH``: in the given hour
            - ``YYYY-MM-DDTHH:MM``: in the given minute
            - ``YYYY-MM-DDTHH:MM:SS``: in the given second
            - ``@1735732800``: within the 1 second interval from the given UNIX timestamp

            Date and time patterns match the interval implied by their precision, including
            the start and excluding the end. For example, ``date:2026-06`` matches archives
            created on or after ``2026-06-01T00:00:00`` and before ``2026-07-01T00:00:00``.

            Date and time patterns may include a timezone suffix: ``Z`` (UTC), ``+HH:MM``,
            ``-HH:MM``, or ``[Region/City]``. Patterns without a timezone are interpreted
            in the local timezone. Be wary of Daylight Saving Time (DST) transitions, as
            they can make time intervals ambiguous or nonexistent. Use UTC to avoid such
            issues. Unix timestamps are always UTC and do not accept a timezone suffix.

It separates exact timestamp matches from interval matches, explains why exact matches include fractional seconds, is a bit more explicit about how interval matching works, and adds a note about timezones and DST.

Scoped this down to static date matching after the review feedback.

date: filters archives by creation date/time in ISO-8601 format. A calendar
pattern (YYYY through second precision) matches the interval its precision
implies, start-inclusive and end-exclusive.

Fractional seconds and fractional Unix timestamps (@epoch.ffffff) resolve to
an exact microsecond, while a plain @epoch covers one second.

Z, +/-HH:MM or [Region/City] TZ suffixes are supported. Without one the
pattern is local time, and Unix timestamps are always UTC so a suffix is
rejected.

Parsing is a unified re.VERBOSE regex with named groups, as suggested.
Out-of-range arithmetic now raises DatePatternError instead of leaking
uncaught ValueError/OverflowError (was latent bug). Placed docs in the match-archives helptext.
@c-herz c-herz marked this pull request as draft July 1, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants