Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715)#8776
Add preliminary support for ISO-8601 timestamps via date: archive match pattern (#8715)#8776c-herz wants to merge 1 commit into
Conversation
|
Thanks for picking this up! ❤️
What's the precision of an archive's creation time? From the code I assume it's with fractional seconds, right? I absolutely agree with you then: There should be a variant that is guaranteed to match a single archive. I feel like that Unix timestamps should also optionally support fractional seconds then.
From reading the code I derive that the 1-hour interval is matched exclusively, i.e. it's actually matching any archive within 00:59:59.9999… hours, correct? Perfect 👍
Even though I like the simplicity, I feel like that Borg should be pretty strict about the format, because being less strict easily leads to ambiguity. For example, is If there's no library that can be used, I always imagined that the code would basically revolve around a single, rather strict regex with bottom-up optional groups for year, month, day, hour, minutes, seconds, and fractal seconds, or In general I like to encourage creating extensive unit tests as early as possible. It's elegant and simple code now (🚀 👍), but complexity will increase greatly when adding more and more features. Note: I can read the code, but can't do an actual code review - for that I just don't known enough of Borg's code. |
ThomasWaldmann
left a comment
There was a problem hiding this comment.
Thanks for the PR!
Some minor stuff I found...
|
BTW, if you install the pre-commit hook, you can have your commits automatically formatted. https://borgbackup.readthedocs.io/en/stable/development.html#building-a-development-environment |
| (?: | ||
| :(?P<minute>\d{2}|\*) # minute (MM or *) | ||
| (?: | ||
| :(?P<second>\d{2}(?:\.\d+)?|\*) # second (SS or SS.fff or *) |
There was a problem hiding this comment.
Yay, that is a nice regex now. :-)
You could also deal with fractional seconds as with all other components: it is a optional component, so you can also match it with a named group and later check the groupdict.
| r"(?:(?P<years>\d+)Y)?" | ||
| r"(?:(?P<months>\d+)M)?" | ||
| r"(?:(?P<weeks>\d+)W)?" | ||
| r"(?:(?P<days>\d+)D)?" | ||
| r"(?:(?P<hours>\d+)h)?" | ||
| r"(?:(?P<minutes>\d+)m)?" | ||
| r"(?:(?P<seconds>\d+)s)?" |
There was a problem hiding this comment.
didn't we use to have lowercase ymwd and uppercase HMS?
There was a problem hiding this comment.
@c-herz, is that a custom format, or did I miss an ISO 8601 or RFC update? I know ISO-8601's P3Y6M4DT12H30M5S (P designator, 3 years, 6 months, 4 days, T time separator, 12 hours, 30 minutes, 5 seconds, all being optional) format. I honestly don't like that "official" format very much (it's so cumbersome…), especially in regards to using M for both months and minutes (it's still unambiguous due to the T separator), but I feel like we should support it, because it's an official part of ISO 8601.
However, I kinda like the idea of additionally supporting our own format. Like, why not support 12:34:56 (or similar, just a quick idea) to specify a 12 hours, 34 minutes, 56 seconds duration? Why not also support (space) as alternative to T to separate times? The designators could ignore case (i.e. also supporting 7d for 7 days), and we could allow a space after each term. This might go too far though. In general, I'm not sure whether we should put things into the "official" P designator, or rather additionally support our own (like the suggested D, e.g. D 3y 6m 4d 12:30:05?). Is there maybe another common or even formalized/standardized (like another RFC; I did some research, unfortunately I didn't find anything official or quasi-official) format? WDYT?
| # ISO week date: YYYY-Www or YYYY-Www-D | ||
| (?P<isoweek_year>\d{4})-W(?P<isoweek_week>\d{2})(?:-(?P<isoweek_day>\d))? | ||
| | # Ordinal date: YYYY-DDD | ||
| (?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3}) |
There was a problem hiding this comment.
This might be a bit silly, but it just got me thinking: Could we support ordinal days with wildcard years? 🤔 Same for ISO weeks, possibly even *-W*-5 (wow, that looks crazy 😆) to match all archives created on a Friday? Not sure whether users would use it, but if it's possible? WDYT?
| | # Ordinal date: YYYY-DDD | ||
| (?P<ordinal_year>\d{4})-(?P<ordinal_day>\d{3}) | ||
| | # Unix epoch | ||
| @(?P<epoch>\d+) |
There was a problem hiding this comment.
Also supporting fractal seconds here would be amazing! ❤️
Side note: I might read the regex wrong, but this also means @1745577106[Europe/Berlin] (or any other TZ format) is supported? AFAIK Unix timestamps are UTC per definition, right? Or is the TZ info used later for something else?
| Also supports: | ||
| TIMESTAMP/TIMESTAMP | ||
| TIMESTAMP/DURATION | ||
| DURATION/TIMESTAMP. |
There was a problem hiding this comment.
Great work! ❤️
Just DURATION (i.e. without a TIMESTAMP) isn't supported yet, right? Just specifying a duration is helpful to match the latest archives relative to "now" (which could be another useful keyword).
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
|
before doing further changes, please rebase on current master branch to get the cython workaround. otherwise, builds will fail. |
|
ping? |
|
My apologies, I have been caught up with finals and starting a new internship. I should be able to look into @PhrozenByte's suggestions by this weekend. Apologies for the delays and the rather poor communication! |
|
ping? (no hurry, but if you find some time, it would be nice to finish this) |
|
Note: this PR can't even run the tests anymore, but it's not the fault of the code here, but current Cython requires some fixes not yet present here, but fixed in current master branch. Please rebase on current master. |
|
Also, I would appreciate if we could get this into a mergable state ASAP. We will soon have bigger changes and movements in the code, so there could be lots of merge conflicts coming. |
|
ping |
|
What do you think about reducing the scope of this PR? It might help move things forward and get the core functionality merged sooner. A recent example is #8775, which was opened just one day before this PR and was successfully merged yesterday after its scope was reduced. In particular, support for durations, from/to matching, and special keywords (e.g. "oldest") seems to be more involved than originally anticipated. Would it make sense to focus on date support first, perhaps with only very basic or no wildcard support, and leave the more complex parts for follow-up PRs? |
|
Re: @PhrozenByte's suggestion--that definitely would make it a bit easier for me to get things up to speed after a year of neglect. My sincere apologies for the radio silence on my end. Over the past year life really got in the way for me and I just didn't have time to work on hobby projects. However, I certainly should've communicated that better. In any case, I should have some time this weekend and possibly here and there throughout the week to clean things up and at least get the basic feature I worked on last year rebased onto master and integrated with the major changes since then (which to be honest, I need to catch up on). I can shelve any nice-to-haves for now and focus on core functionality, and certainly keep everything as a reference if these are desired in the future and/or I have time to extend. Please let me know what should be the highest priority moving forward! |
|
There's really no need to apologize. We're all contributing in our spare time, and there are certainly more important things in life than getting a PR merged quickly 🙂 In my opinion, it should ultimately be up to you to decide which features you want to work on and where you want to draw the line for the initial implementation. That said, since I originally suggested this in #8715, my personal motivation was simply being able to match archives by static dates. I create daily backups, and while the date used to be part of the archive name in Borg 1.x, archive series in Borg 2.0 work differently. Because of that, something as simple as a static match pattern like But again, you're the one doing the actual work here, so feel free to draw the line wherever it makes sense to you, both in terms of implementation effort and what you actually feel like working on. |
|
Review done by Claude Opus 4.8 OverallThe implementation is clean and the integration into Blocking / process
Substantive findings
Minor / polish
Bottom lineSolid, mergeable-quality core once rebased. Suggested merge gates: (1) rebase onto current master, (2) the |
|
Follow-up by Claude Opus 4.8 The CI prune failures are caused by this PR's new test leaking
|
PhrozenByte
left a comment
There was a problem hiding this comment.
From an user's perspective: LGTM, great work! ❤️
Just three rather minor suggestions:
- WDYT about also allowing
(space) as a date-time separator (e.g.date:2025-01-01 14:30)? - It might be a good idea to add a test for Daylight Saving Time (DST) specifically. For example,
date:2026-03-29T03:00[Europe/Berlin]will behave a little weird and different fromdate:2026-03-29T03:00+02:00. IMHO there's no need to do anything about it, I just think it might be a good idea to make the behaviour reproducible with a unit test and note to users that timezone quirks exist (IMHO it's not necessary to go into detail, just that timezones can be weird and that one might want to force UTC instead). - I have a few suggestions concerning the help text. See comment below.
As always, I didn't do an actual code review, I'm reviewing this purely from an user's perspective by reading the code changes, unit tests, and docs.
I can do an actual test run once this has been rebased on latest master, but due to the pretty extensive test suite I don't think that I'll catch anything by that.
| Full regular expression support. | ||
| This is very powerful, but can also get rather complicated. | ||
|
|
||
| Date patterns, selector ``date:`` |
There was a problem hiding this comment.
At first glance, I found this a little hard to follow. Here's a suggestion for the help text:
Date patterns, selector ``date:``
Match archives by creation timestamp. You can either match a single archive by
passing its exact creation time, or all archives created within a given time
interval.
To match a single archive by its exact creation time, use the forms:
- ``YYYY-MM-DDTHH:MM:SS.ffffff``: ISO-8601-like date-time string
- ``@1735732800.123456``: UNIX timestamp
To match a single archive, the pattern must specify the archive's complete
creation timestamp, including any fractional seconds. Fractional-second
patterns accept 1 to 6 digits.
To match all archives created within a given time interval, use the forms:
- ``YYYY``: match all archives created within the given year
- ``YYYY-MM``: within the given month
- ``YYYY-MM-DD``: on the given day
- ``YYYY-MM-DDTHH``: in the given hour
- ``YYYY-MM-DDTHH:MM``: in the given minute
- ``YYYY-MM-DDTHH:MM:SS``: in the given second
- ``@1735732800``: within the 1 second interval from the given UNIX timestamp
Date and time patterns match the interval implied by their precision, including
the start and excluding the end. For example, ``date:2026-06`` matches archives
created on or after ``2026-06-01T00:00:00`` and before ``2026-07-01T00:00:00``.
Date and time patterns may include a timezone suffix: ``Z`` (UTC), ``+HH:MM``,
``-HH:MM``, or ``[Region/City]``. Patterns without a timezone are interpreted
in the local timezone. Be wary of Daylight Saving Time (DST) transitions, as
they can make time intervals ambiguous or nonexistent. Use UTC to avoid such
issues. Unix timestamps are always UTC and do not accept a timezone suffix.
It separates exact timestamp matches from interval matches, explains why exact matches include fractional seconds, is a bit more explicit about how interval matching works, and adds a note about timezones and DST.
Scoped this down to static date matching after the review feedback. date: filters archives by creation date/time in ISO-8601 format. A calendar pattern (YYYY through second precision) matches the interval its precision implies, start-inclusive and end-exclusive. Fractional seconds and fractional Unix timestamps (@epoch.ffffff) resolve to an exact microsecond, while a plain @epoch covers one second. Z, +/-HH:MM or [Region/City] TZ suffixes are supported. Without one the pattern is local time, and Unix timestamps are always UTC so a suffix is rejected. Parsing is a unified re.VERBOSE regex with named groups, as suggested. Out-of-range arithmetic now raises DatePatternError instead of leaking uncaught ValueError/OverflowError (was latent bug). Placed docs in the match-archives helptext.
This PR adds preliminary support for matching ISO 8601 timestamps with the
date:archive filter, and intends to begin addressing the requirements of #8715.~~Timestamps (except for Unix epoch forms) are currently assumed to be in the user's local timezone and converted to UTC internally. The following formats are currently supported:
YYYYYYYY-MMYYYY-MM-DDYYYY-MM-DDTHH-> matches a 1-hour intervalYYYY-MM-DDTHH:MM-> matches a 1-minute intervalYYYY-MM-DDTHH:MM:SS-> matches a 1-second intervalYYYY-MM-DDTHH:MM:SS.ffff-> matches an exact timestamp, including up to microseconds@123456789-> Unix epoch (interpreted as UTC)Edit: Scoped down significantly from original PR; future improvements can be made in future PRs.