3

I am trying to parse mail headers. I am trying to parse the date with Chrono, by giving it the RFC2822 strings. The problem is that it is not able to parse strings on the format 2 Nov 2021 14:26:12 +0000 (UTC), where the problem seems to be the last part (UTC). How can I make Chrono parse also these strings?

use chrono::prelude::DateTime; // 0.4.19
use regex::Regex;              // 1.6.0

let date = "2 Nov 2021 14:26:12 +0000"; // does work
let date = "2 Nov 2021 14:26:12 +0000 (UTC)"; // does not work

// regex parses "[+-]dddd (www)" => " "[+-]dddd"
let re = Regex::new(r"([+-]?\d{4}) \(\w+\)$").unwrap();
let date = DateTime::parse_from_rfc2822(
        &re.replace(date_rfc2822_str, "$1")
    )
    .unwrap()
);

I can use regex to just remove the last part, but is it possible to parse it without this "hack"?

E_net4
  • 27,810
  • 13
  • 101
  • 139
fevar
  • 107
  • 6
  • Did you try anything yet? If yes, can we see your code? – Finomnis Jul 08 '22 at 16:54
  • 1
    I added a code example – fevar Jul 08 '22 at 16:59
  • Does `2 Nov 2021 14:26:12 +0000 (UTC)` match any standard? Or is it simply `RFC2822` with additional stuff attached? – Finomnis Jul 08 '22 at 17:01
  • If I understand your question correctly, you are asking if the above date string is the standard. From https://datatracker.ietf.org/doc/html/rfc2822#section-3.3, I think the last line in the section (about the zone) says that this is valid inside the standard as well, but I am not sure. I have just parsed it directly from the MIME header returned from https://github.com/jonhoo/rust-imap – fevar Jul 08 '22 at 17:04
  • Yes, you are right, it is valid RFC2822. The standard specifies `CFWS` at the end, which is `*([FWS] comment) (([FWS] comment) / FWS)` and matches `(UTC)`. `comment` is `"(" *([FWS] ccontent) [FWS] ")"`. I think this is a bug in `chrono` and should be fixed. – Finomnis Jul 08 '22 at 17:10
  • Was already discussed: https://github.com/chronotope/chrono/issues/462 I'm unsure as to why they came to the conclusion though, in my opinion it's valid – Finomnis Jul 08 '22 at 17:16
  • 1
    Official quote from [their code](https://github.com/chronotope/chrono/blob/051e1170c41477ce162301c8711110a4577c1a23/src/format/parse.rs#L80): *we do not recognize a folding white space (FWS) or comment (CFWS). for our purposes, instead, we accept any sequence of Unicode white space characters (denoted here to `S`). any actual RFC 2822 parser is expected to parse FWS and/or CFWS themselves and replace it with a single SP (`%x20`); this is legitimate.* – Finomnis Jul 08 '22 at 17:23
  • I've opened an issue: https://github.com/chronotope/chrono/issues/732 – Finomnis Jul 08 '22 at 17:44
  • Can you parse the format youself? https://docs.rs/chrono/0.4.7/chrono/format/fn.parse.html – Yury Jul 08 '22 at 18:07
  • You have to strip out comments and other chaff before you parse it with the `DateTime` library. It's not aware of the full nuances of comments, which can be nested, or other quirks of the RFC5322 syntax. Maybe strip it off with a loose regular expression that just snips `\(.*`? – tadman Jul 09 '22 at 04:33
  • @tadman Might be [compatible soon](https://github.com/chronotope/chrono/pull/733). – Finomnis Jul 09 '22 at 17:36
  • 1
    @fevar It will work as soon as I get it through the reviews. It was recognized and accepted by the devs that this is incorrect behaviour and that we need to implement compatibility with rfc2822 comments. Until it is merged, you could use this in your `Cargo.toml`: `chrono = { git = "https://github.com/Finomnis/chrono.git", branch = "rfc2822_comments" }` – Finomnis Jul 11 '22 at 08:31
  • @fevar: It got fixed and merged. Will be released soon. That means, though, that my branch doesn't exist anymore and the `Cargo.toml` entry of the previous message will no longer work. – Finomnis Jul 24 '22 at 14:49
  • @fevar: It got fixed and merged. Will be released soon. That means, though, that my branch doesn't exist anymore and the `Cargo.toml` entry of the previous message will no longer work. In the meantime, use `chrono = { git = "https://github.com/chronotope/chrono.git" }`. – Finomnis Jul 24 '22 at 14:54

1 Answers1

1

This was a bug in chrono.

It got fixed and will potentially be released in chrono version 0.4.20.

use chrono::prelude::DateTime; // main branch

fn main() {
    let date = "2 Nov 2021 14:26:12 +0000 (UTC)";
    println!("{}", DateTime::parse_from_rfc2822(date).unwrap());
}
2021-11-02 14:26:12 +00:00
Finomnis
  • 18,094
  • 1
  • 20
  • 27