How can R considers a string as negative?

Question

Please consider the following

string_1 = "??????????"
string_2 = " bob"
string_3 = "_bob_"
string_1 < 0
# [1] TRUE
string_2 < 0
# [1] TRUE
string_3 < 0
# [1] TRUE

but

string_4 = "bob"
string_4 < 0
# [1] FALSE

Why a string is considered as a negative value by R? Is there any particular character that turns the string into a negative value? If so how could I sanitize a vector of strings from being treated as negative?

What @MatthewLundberg says and what ?'==' says: "If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw." — TheComeOnMan, Nov 12 '13 at 06:08
Come on, people... this is not difficult. `"?" < "0" [1] TRUE` and `"_" < "0" [1] TRUE` and `" " < "0" [1] TRUE`. "0" is not _zero_. — IRTFM, Nov 12 '13 at 07:11

Matthew Lundberg · Answer 1 · 2013-11-12T14:49:15.030

1

This is simply an alphabetic sort order.

"b" < 0
### [1] FALSE
"?" < 0
## [1] TRUE

Just how each of these compare to "0"; an alphabetic sort as 0 is converted to character. Similarly:

"hello" > "goodbye"
## [1] TRUE

edited Nov 12 '13 at 14:49

answered Nov 12 '13 at 06:02

Matthew Lundberg

42,009
6
90
112

How can I sanitize a string to ensure it to be positive then? I have a problem with a function of the package `igraph` which won't accept for id a negative value (even if it is a character). Then I guess I have to convert to few "negative" values into "positive"... – CptNemo Nov 12 '13 at 06:31
Hi @CptNemo. If you have a follow up question, you are welcomed to open up a new question. Asking followups in the comments is cumbersome (for the answerer), hard to follow (for the community) and strongly discouraged – Ricardo Saporta Nov 12 '13 at 06:40
Well, my follow-up question is part of my main question. Let's say it is a kind reminder... :-) – CptNemo Nov 12 '13 at 06:41
1

YES, of course it "looks like" it is just checking the first character. Look at any dictionary. – IRTFM Nov 12 '13 at 07:10
why don't you just `string <- paste0(1,string)` if you want to ensure that `string > "0"` ? (1 must come after 0 in any sensible lexicographic ordering, right?) – Ben Bolker Nov 12 '13 at 14:42

score -1 · Answer 2 · answered Nov 12 '13 at 06:11

-1

Is it possible, that you string_1 is not actually composed of question mark characters, but rather of some unprintable characters with first of them having ascii value < 48 ('0')?

Because my brief experimentation shows, that R promotes 0 to "0" and then does lexicographic compare of 2 strings.

"4aaaa" < 5
# [1] TRUE

"6bbbb" < 5
# [1] FALSE

The 0 is not somehow special, as strings are not numbers.

answered Nov 12 '13 at 06:11

oakad

6,945
1
22
31

Why should we be asking "is it possible, that you[r] string_1 is not actually composed of question mark character"? – IRTFM Nov 12 '13 at 07:08
Because "?" is ascii code 63, while "0" is 48. Thus, sting_1 composed of true question marks will be greater than 0. You can check for yourself if you've got R handy. – oakad Nov 13 '13 at 00:03
This is probably locale depended. On my setup and this internet box (http://www.compileonline.com/execute_r_online.php) "?" collates as ascii and thus greater than zero. – oakad Nov 13 '13 at 00:07
See `?"<"`: "The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising." – IRTFM Nov 13 '13 at 00:08
And yet, '?' is a common placeholder for a non-printable character. (OP said nothing about his locale, btw, so one can assume whatever). – oakad Nov 13 '13 at 00:11
If you edit your answer, persons who might have downvoted it will be able to reverse that action. – IRTFM Nov 13 '13 at 00:15

How can R considers a string as negative?

2 Answers2