22

V8 Date parser is broken:

> new Date('asd qw 101')
Sat Jan 01 101 00:00:00 GMT+0100 (CET)

I can use fragile regular expression like this:

\d{1,2} (jan|feb|mar|may|jun|jul|aug|sep|oct|nov|dec) \d{1,4}

but it is too fragile. I cannot rely on new Date (issue in V8) and also moment cant help me because moment is getting rid off date detection (github issue-thread).

is there any workaround for broken v8 date parser?

To be clear. We have Gecko and V8, both have Date. V8 has broken Date, Gecko has working one. I need the Date from in Gecko (Firefox).

Update: It’s definitely broken parser https://code.google.com/p/v8/issues/detail?id=2602
nope, Status: WorkingAsIntended

Vladimir Starkov
  • 19,264
  • 8
  • 60
  • 114

3 Answers3

28

Date objects are based on a time value that is the number of milliseconds since 1 January, 1970 UTC and have the following constructors

new Date();
new Date(value);
new Date(dateString);
new Date(year, month[, day[, hour[, minutes[, seconds[, milliseconds]]]]]);

From the docs,

dateString in new Date(dateString) is a string value representing a date. The string should be in a format recognized by the Date.parse() method (IETF-compliant RFC 2822 timestamps and also a version of ISO8601).

Now looking at the v8 sourcecode in date.js:

function DateConstructor(year, month, date, hours, minutes, seconds, ms) {
  if (!%_IsConstructCall()) {
    // ECMA 262 - 15.9.2
    return (new $Date()).toString();
  }

  // ECMA 262 - 15.9.3
  var argc = %_ArgumentsLength();
  var value;
  if (argc == 0) {
    value = %DateCurrentTime();
    SET_UTC_DATE_VALUE(this, value);
  } else if (argc == 1) {
    if (IS_NUMBER(year)) {
      value = year;
    } else if (IS_STRING(year)) {
      // Probe the Date cache. If we already have a time value for the
      // given time, we re-use that instead of parsing the string again.
      var cache = Date_cache;
      if (cache.string === year) {
        value = cache.time;
      } else {
        value = DateParse(year);               <- DOES NOT RETURN NaN
        if (!NUMBER_IS_NAN(value)) {
          cache.time = value;
          cache.string = year;
        }
      }

    }
...

it looks like DateParse() does not return a NaN for for a string like 'asd qw 101' and hence the error. You can cross-check the same with Date.parse('asd qw 101') in both Chrome(v8) [which returns -58979943000000] and Gecko (Firefox) [which returns a NaN]. Sat Jan 01 101 00:00:00 comes when you seed new Date() with a timestamp of -58979943000000(in both browsers)

is there any workaround for broken v8 date parser?

I wouldnt say V8 date parser is broken. It just tries to satisfy a string against RFC 2822 standard in the best possible way but so does gecko and both break gives different results in certain cases.

Try new Date('Sun Ma 10 2015') in both Chrome(V8) and Firefox(Gecko) for another such anomaly. Here chrome cannot decide weather 'Ma' stands for 'March' or 'May' and gives an Invalid Date while Firefox doesnt.

Workaround:

You can create your own wrapper around Date() to filter those strings that V8's own parser cannot. However, subclassing built-ins in ECMA-5 is not feasible. In ECMA-6, it will be possible to subclass built-in constructors (Array, Date, and Error) - reference

However you can use a more robust regular expression to validate strings against RFC 2822/ISO 8601

^(?:(?:31(\/|-|\. |\s)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.|\s)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.|\s)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.|\s)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$

Image-regex Image generated from debuggex

So, seems like v8 aint broken, it just works differently.

Hope it helps!

Community
  • 1
  • 1
nalinc
  • 7,375
  • 24
  • 33
  • 2
    it helps a lot. btw, today I found an issue from March 2013 that V8 parser should return `NaN` btw I still think its broken https://code.google.com/p/v8/issues/detail?id=2602, so probably it is officially broken – Vladimir Starkov Jun 30 '15 at 16:21
  • 2
    or probably not "Status: WorkingAsIntended" – Vladimir Starkov Jun 30 '15 at 16:29
  • Exactly! IMHO, It depends on the how _well_ a parser supports the standards(RFC/ISO). They mostly cover the scenarios where they should work as expected, but throw junk in certain use-cases. As an example, V8 does throw a TypeError(as expected) on `new Date('asd qw 21')` but qualifies a similarly structured string `new Date('asd qw 121')` as a date(which it shouldn't). One can always come up with a better regex/parsing technique for wider support, but still cannot guarantee it to work for every valid format and junk string. Exhaustive testing in such cases is neither practical nor possible :) – nalinc Jun 30 '15 at 17:37
  • That regex and diagram are pretty - but I can't seem to get it to work with RFC2822, and it certainly doesn't accommodate ISO8601. – Matt Johnson-Pint Jul 01 '15 at 20:34
11

You seem to be asking for a way to parse a string that might be in any particular format and determine what data is represented. There are many reasons why this is a bad idea in general.

You say moment.js is "getting rid of date detection", but actually it never had this feature in the first place. People just made the assumption that it could do that, and in some cases it worked, and in many cases it didn't.

Here's an example that illustrates the problem.

 var s = "01.02.03";

Is that a date? Maybe. Maybe not. It could be a section heading in a document. Even if we said it was a date, what date is it? It could be interpreted as any of the following:

  • January 2nd, 2003
  • January 2nd, 0003
  • February 1st, 2003
  • February 1st, 0003
  • February 3rd, 2001
  • February 3rd, 0001

The only way to disambiguate would be with knowledge of the current culture date settings. Javascript's Date object does just that - which means you will get a different value depending on the settings of the machine where the code is running. However, moment.js is about stability across all environments. Cultural settings are explicit, via moment's own locale functionality. Relying on the browser's culture settings leads to errors in interpretation.

The best thing to do is to be explicit about the format you are working with. Don't allow random garbage input. Expect your input in a particular format, and use a regex to validate that format ahead of time, rather then just trying to construct a Date and seeing if it's valid after the fact.

If you can't do that, you'll have to find additional context to help decide. For example, if you are scraping some random bits of the web from a back-end process and you want to extract a date from the text, you'd have to have some knowledge about the language and locale of each particular web page. You could guess, but you'd likely be wrong a fair amount of the time.

See also: Garbage in, garbage out

Matt Johnson-Pint
  • 230,703
  • 74
  • 448
  • 575
  • Thanks for reponse, basically I dont need to rely on momentjs at all. I can use Date and will pretty happy with it. The point is that Firefox implementation is working and Chrome’s one is broken. – Vladimir Starkov Jun 26 '15 at 14:30
  • @VladimirStarkov I wouldn't say chrome is broken, it's just trying to parse the input string – maioman Jun 28 '15 at 21:45
  • @maioman probably it’s not broken, but it doing its job not good enough – Vladimir Starkov Jun 29 '15 at 10:15
  • @VladimirStarkov without knowing the subsets it's like crawling in the dark; but if you divided the string and fed it to the Date object instance with `setFullYear` , `setMonth` , `setDate` you should have a more consistent result , take a look at the [fiddle](http://jsfiddle.net/maio/cztaL0bu/) – maioman Jun 29 '15 at 13:13
10

ES5 15.9.4.2 Date.parse: /.../ If the String does not conform to that format the function may fall back to any implementation-specific heuristics or implementation-specific date formats. Unrecognizable Strings or dates containing illegal element values in the format String shall cause Date.parse to return NaN.

So that's all right and according to the citation above result of v8 date parser:

  1. new Date('asd qw 101') : Sat Jan 01 101 00:00:00 GMT+0100 (CET)
  2. new Date('asd qw') : Invalid Date
Vladimir Starkov
  • 19,264
  • 8
  • 60
  • 114
stdob--
  • 28,222
  • 5
  • 58
  • 73