5
q <- quantile(faithful$eruptions)
> q
     0%     25%     50%     75%    100% 
1.60000 2.16275 4.00000 4.45425 5.10000 

I get the following result, the dataset is provided in R.

 head(faithful)
  eruptions waiting
1     3.600      79
2     1.800      54
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55

I want a dataframe containing the data and an additional column for pointing out the quantile to which each observations belong. For example the final dataset should look like

     eruptions waiting Quartile
1     3.600      79      Q1
2     1.800      54      Q2
3     3.333      74
4     2.283      62
5     4.533      85
6     2.883      55

How can this be done?

zx8754
  • 52,746
  • 12
  • 114
  • 209
darkage
  • 857
  • 3
  • 12
  • 22

3 Answers3

9

Something along the lines of this? Use values from quantile function as values to cut the desired vector.

faithful$kva <- cut(faithful$eruptions, q)
levels(faithful$kva) <- c("Q1", "Q2", "Q3", "Q4")
faithful

    eruptions waiting  kva
1       3.600      79   Q2
2       1.800      54   Q1
3       3.333      74   Q2
4       2.283      62   Q2
5       4.533      85   Q4
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
3

The cut function has the option to create numeric labels for each quantile right away:

faithful$Quartile <- cut(faithful$eruptions,
                         quantile(faithful$eruptions),
                         labels = FALSE)

This will create an NA for the smallest eruption, if you want to assign the lowest eruption to the first quantile, you can add include.lowest = TRUE when calling the cut function:

faithful$Quartile <- cut(faithful$eruptions,
                         quantile(faithful$eruptions),
                         labels = FALSE,
                         include.lowest = T)
ira
  • 2,542
  • 2
  • 22
  • 36
2

This can now be done more conveniently via a dplyr pipe and ggplot2::cut_number().

library(dplyr)
library(ggplot2)

faithful %>% 
   mutate(Quartile = cut_number(eruptions, n = 4, labels = c("Q1", "Q2", "Q3", "Q4")))

The lowest observation is included by default unlike base R cut().

Joe
  • 8,073
  • 1
  • 52
  • 58