I'm hoping to clean out a time series dataset so that only the maximum value of each event is retained. To start, I filtered the data so that only values above a certain threshold are maintained but there are still values that, while separated by a millisecond or two, act as duplicate values but will throw off later analysis.
My initial dataset has >100,000 rows and a few more columns but here is the top of a smaller version.
head(shortfilter)
Time (Sec) ECG (Channel 6)
1 5534.023 1.371761
2 5534.024 1.232424
3 5534.152 1.414432
4 5534.153 1.359914
5 5534.272 1.639033
6 5534.396 1.476161
Explained: I don't have a concrete time value that they need to be within for it to be considered a duplicate, but the rest of the data is similar to this in that they are generally within .003 s.
Time (Sec) ECG (Channel 6)
1 5534.023 1.371761 #<-- Higher value (keep)
2 5534.024 1.232424
3 5534.152 1.414432 #<-- Higher value (keep)
4 5534.153 1.359914
5 5534.272 1.639033 #<-- Only value (keep)
6 5534.396 1.476161 #<-- Only value (keep)
Ideal:
Time (Sec) ECG (Channel 6)
1 5534.023 1.371761
2 5534.152 1.414432
3 5534.272 1.639033
4 5534.396 1.476161
5 ____.___ _.______
6 ____.___ _.______
I'll add my initial attempt at some conditionals to do what I was hoping, but keep in mind I'm new to coding in general and so I know it isn't remotely correct, just wanted to get some ideas out there. Hope it can give some additional info on what I hope to do though. I'm positive the formatting & syntax are complete gibberish but I'm sure many of you will understand what I was going for lol...
for (i in shortfilter$`Time (Sec)`){
for (j in shortfilter$`ECG (Channel 6)`){
if ((i+1)-i > 0.01 && j > j+1){
remove(j+1)
} else if ((i+1)-i > 0.01 && j < j+1){
remove(j)
}
}
}