5

I have found an issue where R seems to interpret "T" as TRUE even while using all means to avoid doing so (at least according to this post).

Example data (saved as "test.txt"):

col1    col2
1   T
2   T
3   T
4   T
5   T
6   T
7   T
8   T
9   T

Example code:

read.table("test.txt", as.is=TRUE, header=TRUE, 
   stringsAsFactors=FALSE, colClasses=c(character())) 

Produces:

  col1 col2
1    1 TRUE
2    2 TRUE
3    3 TRUE
4    4 TRUE
5    5 TRUE
6    6 TRUE
7    7 TRUE
8    8 TRUE
9    9 TRUE

Only non-ideal solution I found was to set header=FALSE:

read.table("test.txt", as.is=TRUE, header=FALSE, 
    stringsAsFactors=FALSE,
    colClasses=c(character()))        


     V1   V2
1  col1 col2
2     1    T
3     2    T
4     3    T
5     4    T
6     5    T
7     6    T
8     7    T
9     8    T
10    9    T

I realize this may seem somewhat contrived, but this edge case is genuine in that a human gene is named actually "T" (!) with values in col1 being positions within that gene.

Thanks in advance for the help

Community
  • 1
  • 1
Vince
  • 3,325
  • 2
  • 23
  • 41

1 Answers1

7

What makes you think this is "unexpectedly" ?

R guesses for you (and that is generally helpful), but if you know better, use the colClasses=... argument to tell R.

R> res <- read.table(textConnection("col1 col2\n1 T\n2 T\n3 T"), 
+                    header=TRUE, colClasses=c("numeric", "character"))
R> res
    col1 col2
 1    1    T 
 2    2    T 
 3    3    T 
R> sapply(res, class)
        col1        col2  
   "numeric" "character"  
R>

Your post was a little oddly formatted so I didn't see at first that you did in fact specify colClasses. Despite the recycling rule I always recommend to supply a vector with as many entries as you have columns.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 2
    @Vince: Note that Dirk is using `"character"` instead of `character()`; if you make that change in your code yours also works, though now both columns are character class. – Aaron left Stack Overflow Oct 28 '13 at 20:38
  • @Dirk: Thanks. I assumed (falsely) unexpected because setting header=FALSE seems to fix the issue, and the lack of error by my erroneous use of character() instead of "character" for colClasses confused me. But yes, you are correct, under most circumstances this behavior should be expected. I just assumed my use of colClasses should have fixed it. – Vince Oct 28 '13 at 20:45
  • 2
    @Aaron: thanks! Should R report an error when given a non-character vector for value of colClasses? – Vince Oct 28 '13 at 20:47
  • 1
    Well, perhaps, but that wouldn't have solved this issue because `character()` creates an empty character vector. So it was a character vector, it just had no entries. – Aaron left Stack Overflow Oct 28 '13 at 20:59