There are two elements to the question in the OP. The first element was addressed in the comments: df["speed"]
is an object of type data.frame()
whereas df$speed
is a numeric vector. We can see this via the str()
function.
We'll illustrate this with Ezekiel's 1930 analysis of speed and stopping distance, the cars
data set from the datasets
package.
> library(datasets)
> data(cars)
>
> str(cars["speed"])
'data.frame': 50 obs. of 1 variable:
$ speed: num 4 4 7 7 8 9 10 10 10 11 ...
> str(cars$speed)
num [1:50] 4 4 7 7 8 9 10 10 10 11 ...
>
The second element that was not addressed in the comments is that lapply()
behaves differently when passed a vector versus a list()
.
With a vector, lapply()
processes each element in the vector independently, producing unexpected results for a function such as mean()
.
> unlist(lapply(cars$speed,mean))
[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15
[26] 15 16 16 17 17 17 18 18 18 18 19 19 19 20 20 20 20 20 22 23 24 24 24 24 25
What happened?
Since each element of cars$speed
is processed by mean()
independently, lapply()
returns a list of 50 means of 1 number each: the original elements in the cars$speed
vector.
Processing a list with lapply()
With a list, each element of the list is processed independently. We can calculate how many items will be processed by lapply()
with the length()
function.
> length(cars["speed"])
[1] 1
>
Since a data frame is also a list()
that contains one element of type data.frame()
, the length()
function returns the value 1. Therefore, when processed by lapply()
, a single mean is calculated, not one per row of the speed
column.
> lapply(cars["speed"],mean)
$speed
[1] 15.4
>
If we pass the entire cars
data frame as the input object for lapply()
, we obtain one mean per column in the data frame, since both variables in the data frame are numeric.
> lapply(cars,mean)
$speed
[1] 15.4
$dist
[1] 42.98
>
A theoretical perspective
The differing behaviors of lapply()
are explained by the fact that R is an object oriented language. In fact, John Chambers, creator of the S language on which R is based, once said:
In R, two slogans are helpful.
-- Everything that exists is an object, and
-- Everything that happens is a function call.
John Chambers, quoted in Advanced R, p. 79.
The fact that lapply()
works differently on a data frame than a vector is an illustration of the object oriented feature of polymorphism where the same behavior is implemented in different ways for different types of objects.