Using rle function with condition on a column in r

Question

My dataset has 523 rows and 93 columns and it looks like this:

data <- structure(list(`2018-06-21` = c(0.6959635416667, 0.22265625, 
0.50341796875, 0.982942708333301, -0.173828125, -1.229259672619
), `2018-06-22` = c(0.6184895833333, 0.16796875, 0.4978841145833, 
0.0636718750000007, 0.5338541666667, -1.3009207589286), `2018-06-23` = c(1.6165364583333, 
-0.375, 0.570800781250002, 1.603515625, 0.5657552083333, -0.9677734375
), `2018-06-24` = c(1.3776041666667, -0.03125, 0.7815755208333, 
1.5376302083333, 0.5188802083333, -0.552966889880999), `2018-06-25` = c(1.7903645833333, 
0.03125, 0.724609375, 1.390625, 0.4928385416667, -0.723074776785701
)), row.names = c(NA, 6L), class = "data.frame")

Each row is a city, and each column is a day of the year.

After calculating the row average in this way

data$mn <- apply(data, 1, mean)

I want to create another column data$duration that indicates the average length of a period of consecutive days where the values are > than data$mn.

I tried with this code:

data$duration <- apply(data[-6], 1, function(x) with(rle`(x > data$mean), mean(lengths[values])))

But it does not seem to work. In particular, it appears that rle( x > data$mean) fails to recognize the end of a row.

What are your suggestions?

Many thanks

EDIT

Reference dataframe has been changed into a [6x5]

This would be a much better question with a small example data size. Instead of sharing a 523x93 data frame where it's hard to look at a solution and see if it's right, share, say, a 3x5 data frame that will be easy to verify. — Gregor Thomas, Dec 22 '21 at 15:23

score 1 · Answer 1 · answered Dec 22 '21 at 15:48

The main challenge you're facing in your code is getting apply (which focuses on one row at a time) to look at the right values of the mean. We can avoid this entirely by keeping the mean out of the data frame, and doing the comparison data > mean to the whole data frame at once. The new columns can be added at the end:

mn = rowMeans(data)
dur = apply(data > mn, 1, function(x) with(rle(x), mean(lengths[values])))
dur
#   1   2   3   4   5   6 
# 3.0 1.5 2.0 3.0 4.0 2.0 

data = cbind(data, mean = mn, duration = dur)
print(data, digits = 2)
#   2018-06-21 2018-06-22 2018-06-23 2018-06-24 2018-06-25    mean duration
# 1       0.70      0.618       1.62      1.378      1.790  1.2198      3.0
# 2       0.22      0.168      -0.38     -0.031      0.031  0.0031      1.5
# 3       0.50      0.498       0.57      0.782      0.725  0.6157      2.0
# 4       0.98      0.064       1.60      1.538      1.391  1.1157      3.0
# 5      -0.17      0.534       0.57      0.519      0.493  0.3875      4.0
# 6      -1.23     -1.301      -0.97     -0.553     -0.723 -0.9548      2.0

Using rle function with condition on a column in r

1 Answers1

Linked