Essentially, I want to create a bar graph that's a sum of all rainfall each year. Readings are recorded about every hour, but sometimes there are multiple readings per hour. Unfortunately, when there are multiple readings in a single hour, PCP01(variable for rainfall in a given hour) displays the sum of the rainfall for that particular hour at that particular minute interval. This results in there being multiple sums of rainfall depending on how late in the hour it is. This messes with the plotting, so I really only want the greatest PCP01 value of each hour. Sometimes there are two PCP01 values in the same hour that have the same value and in this instance, I would want one of them to be removed. The data frame is ~231000 rows. I attempted creating an algorithm to remove the invalid rows, but I couldn't get it working. I fairly new to R, so I feel like there might be a simpler way of accomplishing this that I'm unaware of.
working.data[181121:181136, c(2,3,4,5,6,7,9,10)]
date.year date.month date.day date.hour date.minute PCP01 SLP SPD
181121 2015 5 21 17 54 0.02 NA 0
181122 2015 5 21 18 54 0.02 NA 3
181123 2015 5 21 19 14 0.03 NA 5
181124 2015 5 21 19 21 0.04 NA 6
181125 2015 5 21 19 35 0.05 NA 5
181126 2015 5 21 19 49 0.07 NA 3
181127 2015 5 21 19 54 0.09 NA 3
181128 2015 5 21 20 9 0.02 NA 3
181129 2015 5 21 20 25 0.04 NA 0
181130 2015 5 21 20 34 0.05 NA 0
181131 2015 5 21 20 44 0.06 NA 5
181132 2015 5 21 20 52 NA NA 3
181133 2015 5 21 20 54 0.09 NA 0
181134 2015 5 21 21 15 0.04 NA 0
181135 2015 5 21 21 30 0.05 NA 0
181136 2015 5 21 21 42 0.05 NA 3
This is a small portion of the dataset from lines 181121-181136. In this instance I would only want rows 181121, 181127, 181133, and 181136. If anyone needs more data from the data frame, I can give a few more portions.
so, you group by year, month, day and hour and you get max(pcpi) – Jorge Lopez Dec 14 '19 at 03:35