11

I have noticed that PHP and JavaScript treat octal and hexadecimal numbers with some difficulty while type juggling and casting:

PHP:

echo 16 == '0x10' ? 'true' : 'false'; //true, as expected
echo 8  == '010'  ? 'true' : 'false'; //false, o_O

echo (int)'0x10';    //0, o_O
echo intval('0x10'); //0, o_O
echo (int)'010';     //10, o_O
echo intval('010');  //10, o_O

JavaScript:

console.log(16 == '0x10' ? 'true' : 'false'); //true, as expected
console.log(8  == '010'  ? 'true' : 'false'); //false, o_O

console.log(parseInt('0x10')); //16, as expected
console.log(parseInt('010'));  //8, as expected
console.log(Number('0x10'));   //16, as expected
console.log(Number('010'));    //10, o_O

I know that PHP has the octdec() and hexdec() functions to remedy the octal/hexadecimal misbehaviour, but I'd expect the intval() to deal with octal and hexadecimal numbers just as JavaScript's parseInt() does.

Anyway, what is the rationale behind this odd behaviour?

mingos
  • 23,778
  • 12
  • 70
  • 107
  • @Truth: No, AFAIK octal notation in PHP and JS is simply a zero prefix, not the `0o` prefix. – BoltClock Nov 24 '11 at 19:46
  • The last 4 PHP lines print `0`, `0`, `10` and `10` on my box, which seem to stroke with: http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion – Bart Kiers Nov 24 '11 at 19:47
  • No, neither PHP nor JavaScript understand the `0o10` notation. – mingos Nov 24 '11 at 19:48
  • @Bart: yes, that part of the manual covers scientific notation and floating point numbers. I'm interested in octal and hexadecimal, which doesn't seem to be covered anywhere I've looked. – mingos Nov 24 '11 at 19:52
  • @Bart, yes, you are right, I just copy-pasted the same line and changed the numbers, apparently forgetting the output comment. Thanks for spotting that. – mingos Nov 24 '11 at 19:54
  • The first example surprises me, I wouldn't have expected that. You usually write octal or hexadecimal numbers unquoted, `$a = 0x10;` or `$a = 010;`. If you quote them, then just parse until the first non-number, ignore 0s at the beginning. Except in the first case, that's the WTF for me. – Carlos Campderrós Nov 24 '11 at 19:55
  • @Bart, if it wasn't supported, why does the 1st example work? – mingos Nov 24 '11 at 19:57
  • @Carlos: if you deal with literals, then yes, but there are other possible sources of data. Parsing ini files or extracting data from the database always yields strings. – mingos Nov 24 '11 at 19:59
  • I wanted to say that there's no cast in your first example but a comparison... but that does not hold with your second example... ~:| Odd. – Bart Kiers Nov 24 '11 at 20:00
  • I guess it could be called a bug the fact that the first example works but not the second (they should work or fail both). About intval not working, I wouldn't expect that this worked, I guess we differ here. [similar open bug for two years](https://bugs.php.net/bug.php?id=48573). – Carlos Campderrós Nov 24 '11 at 20:09
  • @Carlos, my expectation is to see consistent behaviour, regardless whether it does or does not parse non-decimals :). – mingos Nov 24 '11 at 20:12
  • Your entirely question basically just _assumes_ that not treating numbers _in strings_ with leading zeroes as octal is "odd", presumably just because it's not this way for numeric literals. It's fairly baseless. – Lightness Races in Orbit Nov 24 '11 at 20:23
  • @Tomalak: my expectations have little to do with the fact that the behaviour is inconsistent. I could as well remove the "as expected" and "o_O" in the comments. I placed them there to draw attention to the inconsistencies, not to remark that I specifically expect `'010'` to be treated as `010`. – mingos Nov 24 '11 at 20:28

3 Answers3

9

Imagine somebody specifies 035 as a quantity for some product to buy (the leading 0 is just for padding so it matches other three-digit quantities in the list). 035 is obviously expected to be interpreted just like 35 for a non-programmer. But if PHP were to interpret octal numbers in strings the result would suddenly be 29 => WTF?!? Hexadecimal notation on the other hand is less of a problem because people don't commonly specify numbers using a 0x23 notation.

This by the way doesn't only happen to end users, but to programmers too. Often programmers try to pad their numbers with leading zeros and - huh, everything is wrong! That's why JS doesn't allow octal notation in strict mode anymore and other languages use the more explicit 0o prefix.

By the way, I do agree that this behavior is inconsistent. In my eyes hexadecimal notation shouldn't be parsed either. Just like octal and binary notation is not. Especially considering that the explicit (int) cast doesn't parse hex either and instead just reads everything up to the first non-digit.


Addressing the intval case, it actually behaves just like documented: intval isn't there for parsing PHP's native integer notations, it is for parsing integers of a specified base. If you have a look at the docs, you'll find that it takes a second argument $base which defaults to 10. (The (int) cast by the way internally maps down to the same convert_to_long_base call with base = 10, so it will always behave exactly like intval.)

NikiC
  • 100,734
  • 37
  • 191
  • 225
  • I can definitely imagine that alright. `0644 == '0644'` is intuitively obvious to me, too, though. EDIT: OK, I see you edited the answer. Well, I would settle on disallowing all non-decimal casting as a consistent, albeit uncomfortable behaviour. But it's inconsistent and I'm trying to make sense out of it... – mingos Nov 24 '11 at 20:07
  • 1
    @mingos Please, for your own sake, don't try to understand it. PHP is an "organically grown" product. There is no reason for most of the quirky behavior, it's just how things evolved. – NikiC Nov 24 '11 at 20:14
  • 1
    OK, JavaScript's `parseInt()` also takes an optional second parametre for the base. And judging by what @Esailija remarked about the ECMA standard, it would appear that both languages are *intended* to work the same way: interpret strings as decimal integers unless specifically instructed otherwise... – mingos Nov 24 '11 at 21:19
3

In javascript, only decimal and hex are defined as part of the standard, while the octal is implementation dependent, which would explain why octal parsing is not consistent between the examples you gave.

You can get rid of octal literals in strict mode but in all browsers I tested, parseInt still tried to parse an octal instead of decimal. Which is kind of strange because the spec does not say anything about trying to interpret implied octal for parseInt and explicitly prohibits the octal extension when in strict mode. So no octal literals, nothing in the spec about trying to turn "010" into an octal when parseInt'd, and the behavior persists even in strict mode.

So Number("012") === 12 is correct while parseInt("012") === 10 is not correct according to my interpretations of the spec which you can read here

There is a good reason for hexadecimal though, it makes operations on numbers at bit level much easier. And "0xFF" is not something someone types if he doesn't mean a hex.

Esailija
  • 138,174
  • 23
  • 272
  • 326
  • OK, that would explain the JavaScript part. Wonder if it's the same with PHP? – mingos Nov 24 '11 at 20:22
  • @mingos, I have no idea, I decided not to take a swing at the PHP beast :D. Perhaps their specs change and they must leave legacy stuff lying around so legacy programs won't break when they upgrade? That's just a guess though. – Esailija Nov 24 '11 at 20:29
1

Didn't read the other answer, but at least in PHP there is no problem with octal or hexadecimal numbers; you just doing it wrong

"0x12" // String with content "0x12"
0x12 // Integer "18"
010 // integer "8"

Casting the string to integer will ... yes, cast it to integer the way PHP always does it: It will take any number and form the integer out of it until it founds any non-numeric character. In this case its only 0

hexdec() works on strings, but this strings are hexadecimal only without the prefix 0x.

echo hexdec('A0`); // 16

The prefixes 0 (octal) and 0x (hexadecimal) exists to distinguish the different integer notations from each other, but as long as you write it as a string, PHP will treat it as a string.

I assume, that you did a similar mistake with javascript.

KingCrunch
  • 128,817
  • 21
  • 151
  • 173
  • I think the OP is well aware that he is using strings. The question asked is why hex strings are interpreted in some situations but not in others and why oct is never parsed. – NikiC Nov 24 '11 at 20:30
  • @NikiC: Strings gets always "interpreted" (means: "casted"), but probably not the way the OP expects it. – KingCrunch Nov 24 '11 at 20:31
  • @KingCrunch Always, but in different, inconsistent ways. – NikiC Nov 24 '11 at 20:44