1

I'm writing some code to parse RSS feeds but I have trouble with the Abstruse Goose RSS feed. If you look in that feed, dates are encoded as Mon, 06 Aug 2018 00:00:00 UTC. To me, it looks like RFC 2822.

I tried to parse it using chrono's DateTime::parse_from_rfc2822, but I get ParseError(NotEnough).

let pub_date = entry.pub_date().unwrap().to_owned();
return rfc822_sanitizer::parse_from_rfc2822_with_fallback(&pub_date)
    .unwrap_or_else(|e| {
        panic!(
            "pub_date for item {:?} (value is {:?}) can't be parsed due to error {:?}",
            &entry, pub_date, e
        )
    })
    .naive_utc();

Is there something I'm doing wrong? Do I have to hack it some way?

I use rfc822_sanitizer which does a good job at fixing bad writing errors (most of the time). I don't think it impacts the parsing ... but who knows?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Riduidel
  • 22,052
  • 14
  • 85
  • 185
  • Show some Rust code so that others can reproduce your problem. – tadman Sep 11 '19 at 17:09
  • 1
    `Mon, 06 Aug 2018 00:00:00 UTC` is most definitely **not** an RFC2822-formatted timestamp. The valid symbol for UTC in RFC2822, specified [in the doc](https://tools.ietf.org/html/rfc2822) (search for `obs-zone`) is `UT`, not `UTC` – Sébastien Renauld Sep 11 '19 at 17:19
  • @SébastienRenauld Damn ... would it be a candidate for an extension to rfc822 sanitizer ? – Riduidel Sep 11 '19 at 17:27
  • 1
    Possibly. It's a known issue with people trying to roll their own datetime generator, tbh. They see "GMT" and they think it'll be "UTC", when the reality is that the real timezone for universal time is, well, Universal Time. There's a lot of backstory about the name – Sébastien Renauld Sep 11 '19 at 17:28
  • Anyway. Try replacing "UTC" with "UT", try again, and let me know if it works. If it does I'll write an answer with all the trivia – Sébastien Renauld Sep 11 '19 at 17:29
  • @SébastienRenauld And indeed, it worked ... maybe this makes this question a duplicate ... – Riduidel Sep 11 '19 at 17:33
  • Question has been made an issue in rfc822_sanitizer (https://gitlab.com/alatiera/rfc822_sanitizer/issues/1) – Riduidel Sep 11 '19 at 17:33

1 Answers1

1

The RFC2822 date/time format is very well codified in the RFC as the following format:

date-time       =       [ day-of-week "," ] date FWS time [CFWS]
day-of-week     =       ([FWS] day-name) / obs-day-of-week
day-name        =       "Mon" / "Tue" / "Wed" / "Thu" /
                        "Fri" / "Sat" / "Sun"
date            =       day month year
year            =       4*DIGIT / obs-year
month           =       (FWS month-name FWS) / obs-month
month-name      =       "Jan" / "Feb" / "Mar" / "Apr" /
                        "May" / "Jun" / "Jul" / "Aug" /
                        "Sep" / "Oct" / "Nov" / "Dec"
day             =       ([FWS] 1*2DIGIT) / obs-day
time            =       time-of-day FWS zone
time-of-day     =       hour ":" minute [ ":" second ]
hour            =       2DIGIT / obs-hour
minute          =       2DIGIT / obs-minute
second          =       2DIGIT / obs-second
zone            =       (( "+" / "-" ) 4DIGIT) / obs-zone

Where obs-zone is defined as follows:

obs-zone        =       "UT" / "GMT" /          ; Universal Time
                                                ; North American UT
                                                ; offsets
                        "EST" / "EDT" /         ; Eastern:  - 5/ - 4
                        "CST" / "CDT" /         ; Central:  - 6/ - 5
                        "MST" / "MDT" /         ; Mountain: - 7/ - 6
                        "PST" / "PDT" /         ; Pacific:  - 8/ - 7
                        %d65-73 /               ; Military zones - "A"
                        %d75-90 /               ; through "I" and "K"
                        %d97-105 /              ; through "Z", both
                        %d107-122               ; upper and lower case

Something a lot of people get wrong when rolling their own timestamp generation library is this particular point - how to properly label an RFC2822 TZ offset. The reason UT is as it is is because UTC and UT are not exactly the same (one has additional seconds, the other has... four variants! And the RFC does not define which one is used; they're all subtly different).

Sébastien Renauld
  • 19,203
  • 2
  • 46
  • 66