2

For example, I have a dataset of 30-years air temperature of a city, the dataset looks like:

Year  Julian_date  temperature
1991    1             2.1
1991    2             2.2
...     ...           ...
1991    365           2.3
1992    1             2.1
...     ...           ...
1992    365           2.5
...     ...           ...
2020    366           2.5

I would like to calculate the 90th percentile value of each Julian date (from different years), and returen the results, like:

Julian_date        value(the 90th percentile)
1                  2.4
2                  2.6
...                ...
365                2.5

How should I write the code in r?

Jellz
  • 435
  • 1
  • 7
  • 14
  • 1
    There's an exactly the same [question](https://stackoverflow.com/questions/72157696/create-new-dataframe-of-summary-90th-percentile-statistics-for-multiple-rows-o/72157813#72157813) a few hours ago – benson23 May 08 '22 at 07:25
  • 1
    Maybe something like `df %>% group_by(Julian_date) %>% summarize(value = quantile(temperature, probs = 0.9))` – benson23 May 08 '22 at 07:26

2 Answers2

2

You can first group by Julian_date, then use the quantile function to set the probability inside summarise.

library(tidyverse)

df %>% 
  group_by(Julian_date) %>% 
  summarise("value (the 90th percentile)" = quantile(temperature, probs=0.9, na.rm=TRUE))

Output

  Julian_date `value (the 90th percentile)`
        <int>                         <dbl>
1           1                           2.1
2           2                           2.2
3         365                           2.5

Data

df <- structure(list(Year = c(1991L, 1991L, 1991L, 1992L, 1992L, 2020L
), Julian_date = c(1L, 2L, 365L, 1L, 365L, 365L), temperature = c(2.1, 
2.2, 2.3, 2.1, 2.5, 2.5)), class = "data.frame", row.names = c(NA, 
-6L))
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • Dear Andrew, thank you for your answer! I run your code, the output shows only one row and one colume: ```value(the 90th percentile) 2.5```; Do you know maybe where I do wrong? – Jellz May 08 '22 at 07:48
  • @Jellz Are you sure that you have the `group_by(Julian_date)` in the pipe? If not, then it would only return one row and one column. – AndrewGB May 08 '22 at 07:51
  • I restarted R, It works now... – Jellz May 08 '22 at 07:57
  • 1
    @Jellz Okay, good! Probably just one of those weird glitches that happens every once in a while. – AndrewGB May 08 '22 at 07:58
1

You can use quantile() function. If (from different years) in your question means each year should have separate calculation, then you need to group the data frame by Year and Julian_date. If instead it means the different years are combined, you need to group the data frame only by Julian_date, as @AndrewGB and @benson23 showed.

library(dplyr)
yourdf %>% group_by(Year, Julian_date) %>% 
summarise (value_90th_percentile = quantile(temperature, 0.9, na.rm = TRUE))
Abdur Rohman
  • 2,691
  • 2
  • 7
  • 12