0

In a dataframe, I want to be able to separate columns with numeric types from columns with strings/characters.

Here is my data:

test=data.frame(col1=sample(1:20,10),col2=sample(31:50,10),
col3=sample(101:150,10),col4=sample(c('a','b','c'),10,replace=T))

Which looks like

   col1 col2 col3 col4
1     2   41  132    c
2    11   47  141    b
3    13   39  135    a
4    12   31  117    b
5    19   42  106    a
6     8   50  118    a
7    14   33  149    a
8     6   48  148    b
9    16   37  150    b
10    9   34  140    a

Now here is the strange thing if I do typeof a row/col containing a character, R says it is an integer

> typeof(test[1,4])
[1] "integer"

If I do something like this

> apply(test,2,typeof)
       col1        col2        col3        col4 
"character" "character" "character" "character" 

R says they are all characters. Also,

> lapply(test,typeof)
[1] "integer" "integer" "integer" "integer"

Again, what is going on and is there a good way to distinguish between columns with characters and columns with integers?

digEmAll
  • 56,430
  • 9
  • 115
  • 140
Max
  • 837
  • 4
  • 11
  • 20

4 Answers4

2

apply works on arrays and matrices, not data frames.

To work on a data frame, it first converts it to a matrix.

Your data frame has a factor column, so array converts everything to characters. Without bothering to tell you.

As you have seen, sapply is the way to go, and class is probably the thing you want to find out. Although there's also mode, typoeof, and storage.mode depending on what you want to know:

> test$col5=letters[1:10]  # really character, not a factor
> test$col3=test$col3*pi # lets get some decimals in there


> sapply(test, mode)
       col1        col2        col3        col4        col5 
  "numeric"   "numeric"   "numeric"   "numeric" "character" 
> sapply(test, class)
       col1        col2        col3        col4        col5 
  "integer"   "integer"   "numeric"    "factor" "character" 
> sapply(test, typeof)
       col1        col2        col3        col4        col5 
  "integer"   "integer"    "double"   "integer" "character" 
> sapply(test, storage.mode)
       col1        col2        col3        col4        col5 
  "integer"   "integer"    "double"   "integer" "character" 
Spacedman
  • 92,590
  • 12
  • 140
  • 224
0

Okay, I figured out my own question, sorry:

sapply(test,class)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Max
  • 837
  • 4
  • 11
  • 20
0

col4 is a factor:

str(test)
#'data.frame':  10 obs. of  4 variables:
#$ col1: int  11 14 8 19 10 12 7 18 3 16
#$ col2: int  46 39 35 38 42 37 34 32 41 31
#$ col3: int  113 147 138 118 132 139 131 119 108 111
#$ col4: Factor w/ 3 levels "a","b","c": 1 3 2 3 2 3 3 3 1 3

A factor internally is an integer (as reported by typeof) with class factor and a levels attribute. apply coerces the data.frame to a matrix. Since a matrix can hold only one data type, everything is coerced to characters before applying typeof.

Use class to distinguish between the data types and lapply (or sapply) to loop over the columns.

Roland
  • 127,288
  • 10
  • 191
  • 288
0

data.frame(col4=sample(c('a','b','c'),10,replace=T)) the col4 is a factor.

apply(test,2,typeof): if dim(test) == 2L it will use as.matrix(test) firstly.

myincas
  • 1,500
  • 10
  • 15