3

Found some strange behaviour today that I'm hoping someone can shed some light on.

I'm using strptime to validate dates in an import file. In this case, I want to throw an error if a row in the file contains a date that doesn't fit the format %Y/%m/%d (2017/01/25).

I call strptime as follows:

Date.strptime('25/01/2017', '%Y/%m/%d')

I expect this to fail, as 25 does not fit the criteria for the year. However this succeeds, providing a date as:

0025, 01, 20

If I swap the month and day around (01/25/2018), it fails, as it does detect that the month is invalid.

So what gives? It seems bizarre that it not only creates this mental looking year (0025), but even crazier that it disregards the '17' from the end of the string without issue.

Thanks in advance! :)

Jagdeep Singh
  • 4,880
  • 2
  • 17
  • 22
  • 1
    The `2017` --> `20` seems to be a more generic properly of the method: Any trailing characters are *always* ignored. You could do `Date.strptime('25/01/20 BLA BLA BLA', '%Y/%m/%d')` and get the same result. – Tom Lord Jun 27 '18 at 10:53
  • As for why `25` is interpreted as `0025` rather than an error, however, is beyond me. From my understanding, `%Y` should require a *minimum* of 4 digits in order to be valid, but apparently that's not the case. Possibly a bug? – Tom Lord Jun 27 '18 at 10:54
  • You may find the answers in [this question](https://stackoverflow.com/a/39369104/1954610) relevant and helpful to your problem. – Tom Lord Jun 27 '18 at 10:56

1 Answers1

1

You have to think what you actually said:

Date.strptime('25/01/2017', '%Y/%m/%d')

You are saying that you want the year 0025, month 01 and day 20 (it strips the rest). In the end you get 0025-01-20.

You can not rely just on Date.strptime to do the validation for you.

The best is to actually parse it via regexp and do the validation.

For your format a possible regexp (an easy way):

'25/01/2017'.match(/\d{4}\/\d{2}\/\d{2}/)

In your case you will get nil, because it does not match.

If you get a match you will get: #<MatchData "2017/01/25">.

The issue is that this does not check for the correct format of the date. You still need to check if strptime can parse the result ( like the in the link provided by Tom Lord).

On the other hand you can check it also with regexp only, which can be rather complex: (the following regex checks yyyy/mm/dd format):

^(?:(?:(?:(?:(?:[1-9]\d)(?:0[48]|[2468][048]|[13579][26])|(?:(?:[2468][048]|[13579][26])00))(\/)(?:0?2\1(?:29)))|(?:(?:[1-9]\d{3})(\/)(?:(?:(?:0?[13578]|1[02])\2(?:31))|(?:(?:0?[13-9]|1[0-2])\2(?:29|30))|(?:(?:0?[1-9])|(?:1[0-2]))\2(?:0?[1-9]|1\d|2[0-8])))))$

Then you know if the date is in correct format right away and you don't have to check parse it with strptime.

Edit:

When dealing with time remember to always perform your own checks! Don't rely on the function. The problem with time is that you have many exceptions and even thou you have an ISO 8601 and maybe some others may applications do not follow it.

Based on comment I'm want to dig deeper into strptime For now I want to paste the comment in the source code (in date_s_strptime function and data_core.c):

/*
 * call-seq:
 *    Date.strptime([string='-4712-01-01'[, format='%F'[, start=Date::ITALY]]])  ->  date
 *
 * Parses the given representation of date and time with the given
 * template, and creates a date object.  strptime does not support
 * specification of flags and width unlike strftime.
 *
 *    Date.strptime('2001-02-03', '%Y-%m-%d')   #=> #<Date: 2001-02-03 ...>
 *    Date.strptime('03-02-2001', '%d-%m-%Y')   #=> #<Date: 2001-02-03 ...>
 *    Date.strptime('2001-034', '%Y-%j')    #=> #<Date: 2001-02-03 ...>
 *    Date.strptime('2001-W05-6', '%G-W%V-%u')  #=> #<Date: 2001-02-03 ...>
 *    Date.strptime('2001 04 6', '%Y %U %w')    #=> #<Date: 2001-02-03 ...>
 *    Date.strptime('2001 05 6', '%Y %W %u')    #=> #<Date: 2001-02-03 ...>
 *    Date.strptime('sat3feb01', '%a%d%b%y')    #=> #<Date: 2001-02-03 ...>
 *
 * See also strptime(3) and #strftime.
 */

You can see strings like sat/feb being used too, so there is no surprise the parser can deal with strings. TO BE CONTINUED - digging into the C code

tukan
  • 17,050
  • 1
  • 20
  • 48
  • 1
    I ended up going down the initial regex check, followed by the striptime. As Tom Lord said, I would have expected the %Y to expect 4 characters before the first forward slash, which to me is a fair assumption. How it actually works seems very counter intuitive to how the documentation describes it. – Mathew Barnard Jun 27 '18 at 14:25
  • @MathewBarnard see my first edit. When I get to it I'll post more info about the C code details. For now see the comment which says quite a lot about this function. – tukan Jun 28 '18 at 11:08