subset using percentile for gridded data

Question

I have gridded data that has 24249 obs and 963 var for daily maximum temperatures (K). I am looking for a way in r to select all days with maximum temperatures higher than the 90th percentile.

> dim(DailyT)
[1] 24249   963
> DailyT[1:4,1:7]
     x    y  1988-05-01 1988-05-02 1988-05-03 1988-05-04 1988-05-05
1 34.000 33   291.7603   291.8044   291.6158   292.9659   293.7032
2 34.125 33   291.7240   291.7951   291.5439   292.9451   293.7017
3 34.250 33   291.6884   291.7866   291.4721   292.9250   293.7001
4 34.375 33   291.6521   291.7781   291.4010   292.9049   293.6986

I did this but did not work

df<- DailyT[DailyT[,3:963] <= quantile(DailyT[,3:963],.9, na.rm = T, type = 6) ]

Maybe you find [this](https://stackoverflow.com/questions/12519629/remove-data-greater-than-95th-percentile-in-data-frame) helpful. — A. Suliman, Nov 22 '18 at 09:07

jay.sf · Answer 1 · 2018-11-24T11:24:29.753

First, you need an id column to identify the rows later. Then, calculate the 90% quantile of all temperature values. At the end subset data witch any row cells exceeding q.

DailyT <- cbind(id=rownames(DailyT), DailyT)  # to identify rows later
q <- quantile(as.matrix(DailyT[, -(1:3)]), .9, na.rm = T, type = 6)  # 293.7003
DailyT.q <- DailyT[which(sapply(1:nrow(DailyT), function(x) any(DailyT[x, -(1:2)] >= q))), ]

Yields

> DailyT.q
  id      x  y X1988.05.01 X1988.05.02 X1988.05.03 X1988.05.04 X1988.05.05
1  1 34.000 33    291.7603    291.8044    291.6158    292.9659    293.7032
2  2 34.125 33    291.7240    291.7951    291.5439    292.9451    293.7017

Edit: To get the quantile rowwise use apply()

q90 <- apply(DailyT[, 4:8], MARGIN=1, quantile, .9,na.rm = T, type = 6)

> data.frame(DailyT, q90=q90)
  id      x  y X1988.05.01 X1988.05.02 X1988.05.03 X1988.05.04 X1988.05.05      q90
1  1 34.000 33    291.7603    291.8044    291.6158    292.9659    293.7032 293.7032
2  2 34.125 33    291.7240    291.7951    291.5439    292.9451    293.7017 293.7017
3  3 34.250 33    291.6884    291.7866    291.4721    292.9250    293.7001 293.7001
4  4 34.375 33    291.6521    291.7781    291.4010    292.9049    293.6986 293.6986

Data

> dput(DailyT)
structure(list(x = c(34, 34.125, 34.25, 34.375), y = c(33L, 33L, 
                                                       33L, 33L), X1988.05.01 = c(291.7603, 291.724, 291.6884, 291.6521
                                                       ), X1988.05.02 = c(291.8044, 291.7951, 291.7866, 291.7781), X1988.05.03 = c(291.6158, 
                                                                                                                                   291.5439, 291.4721, 291.401), X1988.05.04 = c(292.9659, 292.9451, 
                                                                                                                                                                                 292.925, 292.9049), X1988.05.05 = c(293.7032, 293.7017, 293.7001, 
                                                                                                                                                                                                                     293.6986)), class = "data.frame", row.names = c(NA, -4L))

Thanks, I need to calculate the 90% quantile of each row not of all data. — Ali, Nov 24 '18 at 11:09
Very good! - Please [mark the question as answered](https://meta.stackexchange.com/a/5235/371738) when you're satisfied with the given answer and win +2 reputation. This stops people spending time on answering a question that has already been answered. — jay.sf, Nov 24 '18 at 13:58

subset using percentile for gridded data

1 Answers1