-2

I am using the "quantile" function of R for calculating the percentiles of my dataset. But i am confused due to the different mean values returned by following commands.

mean(quantile(DataSet$V3, prob=c(5,50,95)/100,type=8));

It gives me 101.26

mean(quantile(DataSet$V3,type=8)); 

It gives me 105.27

And

mean(DataSet$V3);

It gives me 109.9

I will be really thankful if some one could answer me that why the mean values are different in three cases.

Regards, Zoraze

thelatemail
  • 91,185
  • 12
  • 128
  • 188
Zack
  • 11
  • 3
  • 9
    You are taking the average of the output of the `quantile()` function in the first two cases, which is not the same as taking the average of the entire column. – mtoto Apr 11 '16 at 08:02
  • Thank you so much for replying, i didn't think this way. Appreciate your help. – Zack Apr 11 '16 at 15:44

1 Answers1

2

The quantiles are obtained by first sorting the data vector in order and then dividing the data into portions, defined by prob= in the quantile() function. Suppose the data is

x <- c(9,3,1,10,2)

Then ordering it gives

> sort(x)
[1]  1  2  3  9 10

The median has 50% of the data below and 50% above. Here the data item 3 is in the `middle' of the list (3rd from either end). You can also calculate it (for longer lists) using

> quantile(x)
0%  25%  50%  75% 100% 
1    2    3    9   10 

The mean obtained from mean(x) can also be calculated from first principles by adding up all the values and dividing by the number of values

> (1 + 2 + 3 + 9 + 10)/5
[1] 5

or using the sum function

> sum(x)/5
[1] 5

So as pointed out by the first commentor, calculating quantiles and means are completely different operations on the data. This is usually covered in any introductory level statistics text book.

slouchy
  • 336
  • 2
  • 8
  • Hi Slouchy, Thank you so much for your detailed answer, i knew how the quantile function is calculating the percentiles but i didn't think that mean(quantile(DataSet$V3, prob=c(5,50,95)/100,type=8)) is just returning the average/mean of (5%,50% and 95%). Thanks once again. – Zack Apr 11 '16 at 15:50
  • In R you can "unwrap" a nested series of functions, so that quantile( ... ) is the first thing, then mean( quantile( ... )) applies the mean function to the result of the quantile function. A bit like peeling an onion. This is how (reverse) polish calculators work, and was also known as "functional programming". This type of thinking is behind ultra-useful functions "apply" and "sapply" in R - so google them to understand the broad concept better. – slouchy Apr 13 '16 at 04:41