0

I have a tibble ('df') with

> dim(df)
[1]  55 144

of which I extract a vector test <- c(df[,39]). I would expect the following result:

> length(test)
[1] 55

as I basically took column 39 from my tibble. Instead, I get

> length(test)
[1] 1

Now, class(test) yielded list, so I thought the class might be the reason; however, with class set to char, I get the same result.

I'm especially confused since length(df[39,]) yields [1] 155.

Background is I am searching in the vector using grep, which doesn't work with a vector taken from a column. Of course, as I am trying to recode all lines in my tibble, I can recode them by row instead of by column, so I think there is a workaround. However, what causes R to assume that test has length 1? What is the difference in the treatment of rows and columns?

Lukas
  • 424
  • 3
  • 6
  • 17
  • 3
    It tells you that the tibble has only 1 column. Tibbles don't coerce to atomic vectors when you extract 1 column. Try `df[[39]]` instead – talat Jun 27 '18 at 12:58

1 Answers1

1

Whenever you apply [] operation on a tibble, it always returns another tibble. This is one of differences between tibble structure and the data.frame in base R.

For example:

a <- 1:5
df = tibble(a,b=a*2,c=a^2)
df2 = as.data.frame(df) # convert to base data.frame
df[,2]  # give a tibble, its dim is 5 1
df2[,2] # give a vector, its dim is NULL, its length is 5. 

You see the return type from the data.frame has been changed from the original type. Meanwhile the tibble is designed in such way to keep the structure consistency between input and output type.

There are two ways, if you want to process certain column of a tibble as vectors.

  1. pull()
  2. [[ ]]

Personally, I am using pull(), which is also very intuitive.

Why length(df[39,]) yields 155?

My understanding is that df[39,] give you a tibble, its dim is 1 155. And its length is equal to the number of columns. Why? Because length also can give the length of lists. Behind of the design of tibble and data.frame, they are constructed by linked list. Each column is actually a list. That's why you can have different types in one tibble or data.frame.

Tony416
  • 596
  • 6
  • 11