8

In the console, go ahead and try

> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0

For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,

> 100000 == "100000"
FALSE

Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!

Henrik
  • 65,555
  • 14
  • 143
  • 159
Robert Krzyzanowski
  • 9,294
  • 28
  • 24
  • 1
    No, that sum is zero, not "TRUE"... – Frank Sep 23 '13 at 16:46
  • @JoshuaUlrich can you explain how those are duplicates? – Señor O Sep 23 '13 at 16:49
  • For the problem described in the last paragraph, you could `match` your input character vector against `1:100000` (which is an integer vector): `match(as.character(1:100000),1:100000)`. – Frank Sep 23 '13 at 16:49
  • I think `!is.na(as.numeric(x))` (or some equivalent using `all()` for the vectorized case) should work for a test ... – Ben Bolker Sep 23 '13 at 17:21

1 Answers1

14

Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.

as.character(100000)
# [1] "1e+05"

Here, from ?Comparison, are R's rules for applying relational operators to values of different types:

If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.

Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")

So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):

as.character(100000)=="100000"
# [1] FALSE
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • From `?'=='`: "If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw." So is it correct to say that LHS side is getting coereced to character and not the other way around? – Señor O Sep 23 '13 at 16:51
  • 2
    @SeñorO In `x == y`, if `y` is character then `x` is converted, so it is *as if* `as.character(100000)=="100000"` were called. It doesn't matter which side of the binary operator this is; as long as one of the pair is character, the other will be coerced to character if not already so. – Gavin Simpson Sep 23 '13 at 17:01
  • @GavinSimpson -- Thanks for the clarifications. I've edited one of them (the bit about "*as if*") into my answer. – Josh O'Brien Sep 23 '13 at 17:06
  • Thanks, I was able to solve the problem using: `old_opts <- options(scipen = 1000)` `on.exit(options(old_opts))` – Robert Krzyzanowski Sep 23 '13 at 21:11