0

I'm trying to use R to find the max value of each day for 1 to n days. My issue is there are multiple values in each day. Heres my code. After I run it incorrect number of dimensions.

Any suggestions:

 Days <- unique(theData$Date)    #Gets each unique Day
 numDays <- length(Days)          
 Time <- unique(theData$Time)     #Gets each unique time
 numTime <- length(Time)
 rowCnt <- 1


 for (i in 1:numDays)  #Do something for each individual day. In this case find max
    {

         temp <- which(theData[i]$Date == numDays[i])
         temp <- theData[[i]][temp,]
         High[rowCnt, (i-2)+2] <- max(temp$High)  #indexing for when I print to CSV
         rowCnt <- rowCnt + 1 
     }

Heres what it should come out to: Except 1 to n days and times.

Day       Time       Value 
 20130310 09:30:00    5   
20130310  09:31:00    1 
20130310   09:32:00    2
20130310    09:33:00    3
20130311   09:30:00    12
20130311   09:31:00    0
20130311   09:32:00    1
20130311   09:33:00    5
so this should return:

day time value
20130310   09:33:00    3
20130311   09:30:00   12

Any help would be greatly appreciated! Thanks!

Junior R
  • 93
  • 2
  • 12

3 Answers3

2

Here is the solution using plyr package

mydata<-structure(list(Day = structure(c(2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L), .Label = c("", "x", "y"), class = "factor"), Value = c(0L, 
1L, 2L, 3L, 12L, 0L, 1L, 5L), Time = c(5L, 6L, 7L, 8L, 1L, 2L, 
3L, 4L)), .Names = c("Day", "Value", "Time"), row.names = c(NA, 
8L), class = "data.frame")
library(plyr)
ddply(mydata,.(Day),summarize,max.value=max(Value))

  Day max.value
1   x         3
2   y        12

Updated1: If your day is say 10/02/2012 12:00:00 AM, then you need to use:

mydata$Day<-with(mydata,as.Date(Day, format = "%m/%d/%Y"))
ddply(mydata,.(Day),summarize,max.value=max(Value))

Please see here for the example.

Updated2: as per new data: If your day is like the one you updated, you don't need to do anything. You can just use the code as following:

    mydata1<-structure(list(Day = c(20130310L, 20130310L, 20130310L, 20130310L, 
    20130311L, 20130311L, 20130311L, 20130311L), Time = structure(c(1L, 
    2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("9:30:00", "9:31:00", 
    "9:32:00", "9:33:00"), class = "factor"), Value = c(5L, 1L, 2L, 
    3L, 12L, 0L, 1L, 5L)), .Names = c("Day", "Time", "Value"), class = "data.frame", row.names = c(NA, 
    -8L))



ddply(mydata,.(Day),summarize,Time=Time[which.max(Value)],max.value=max(Value))
       Day    Time max.value
1 20130310 9:30:00         5
2 20130311 9:30:00        12

If you want the time to appear in the output, then just use Time=Time[which.max(Value)] which gives the time at the maximum value.

Community
  • 1
  • 1
Metrics
  • 15,172
  • 7
  • 54
  • 83
  • I get the same value for each day when I run this. Copied exactly. – Junior R Aug 08 '13 at 22:22
  • Do I have to write a for loop to plug in each day? – Junior R Aug 08 '13 at 22:35
  • No, you don't need to use `for loop` when the same can be handled easily by apply function or friends. – Metrics Aug 08 '13 at 22:36
  • Would you mind explaining that? I'm really new to R. I get results except each day has the same exact result – Junior R Aug 08 '13 at 22:38
  • This method works with the data you provided - so you'll need to figure out what's different between your example data and whatever data you're using that's causing the problem. – Señor O Aug 08 '13 at 22:41
  • Have you installed plyr package using install.packages("plyr")?The command is using your dataframe which I called as mydata, and it is finding max value using `max` for each day. – Metrics Aug 08 '13 at 22:42
  • as @Señor O said your date variable needed to be handled properly before using `ddply`(in your example, this is not the case since you are just saying x, y) – Metrics Aug 08 '13 at 22:44
  • My date is in YYYYMMDD and my time is military time – Junior R Aug 08 '13 at 22:45
  • Check your `theData` vs `mydata` – Henrik Aug 08 '13 at 22:45
  • 2
    The question you asked has been answered, so please review your data, find out what the problem is, and post a new question. – Señor O Aug 08 '13 at 22:49
  • Changed it to my specific data set. Its over 1000 days. Everything works except I get the same number for every day. Its the max value of the whole column which has the values for everyday. – Dennis Mo 2 mins ago edit – Junior R Aug 08 '13 at 22:52
  • Make sure you post example data that is representative for your real data. @Metrics answer worked for *the data you provided* – Henrik Aug 08 '13 at 22:56
1

This is a base function approach:

> do.call( rbind, lapply(split(dfrm, dfrm$Day), 
                         function (df) df[ which.max(df$Value), ] ) )
              Day     Time Value
20130310 20130310 09:30:00     5
20130311 20130311 09:30:00    12

To explain what's happening it's good to learn to read R functions from the inside out (since they are often built around each other.) You wanted lines from a dataframe, so you would either need to build a numeric or logical vector that spanned the number of rows, .... or you can take the route I did and break the problem up by Day. That's what split does with dataframes. Then within each dataframe I applied a function, which.max to just a single day's subset of the data. Since I only got the results back from lapply as a list of dataframes, I needed to squash them back together, and the typical method for doing so is do.call(rbind, ...).

If I took the other route of making a vector for selection that applied to the whole dataframe I would use ave:

> dfrm[ with(dfrm, ave(Value, Day, FUN=function(v) v==max(v) ) ) , ]
         Day     Time Value
1   20130310 09:30:00     5
1.1 20130310 09:30:00     5

Huh? That's not right... What's the problem?

with(dfrm, ave(Value, Day, FUN=function(v) v==max(v) ) )
[1] 1 0 0 0 1 0 0 0

So despite asking for a logical vector with the "==" function, I got conversion to a numeric vector, something I still don't understand. But converting to logical outside that result I succeed again:

> dfrm[ as.logical( with(dfrm, ave(Value, Day, 
                                   FUN=function(v) v==max(v) ) ) ), ]
       Day     Time Value
1 20130310 09:30:00     5
5 20130311 09:30:00    12

Also note that the ave function (unlike tapply or aggregate) requires that you offer the function as a named argument with FUN=function(.). That is a common error I make. If you see the "error message unique() applies only to vectors", it seems out of the blue, but means that ave tried to group an argument that it expected to be discrete and you gave it a function.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
-2

Unlike other programming languages, in R it is considered good practice to avoid using for loops. Instead try something like:

index <- sapply(Days, function(x) {
    which.max(Value)
})
theData[index, c("Day", "Time", "Value")]

This means for each value of Days, find the maximum value of Value and return its index. Then you can select the rows and columns of interest.

I recommend reading the help documentation for apply(), lapply(), sapply(), tapply(), mapply() (I'm probably forgetting one of them…) in and the plyr package.

Ellis Valentiner
  • 2,136
  • 3
  • 25
  • 36
  • For some reasons I am getting undefined columns selected and when I do index it shows: function(x, ...) { UseMethod("index") } any suggestions? Good to know about that for look. thank you – Junior R Aug 08 '13 at 22:30
  • 1
    It'd be better to use `tapply` here: `tapply(Value, Days, which.max)` – Señor O Aug 08 '13 at 22:38
  • @DennisMo it looks like `index` is defined in the environment as a function from the zoo package. So when you try using it as a variable, R is thinking you are trying to use the function. Just call it `i` or something else. – Ellis Valentiner Aug 08 '13 at 23:18
  • @SeñorO you are correct, `tapply` is the better function to use. – Ellis Valentiner Aug 08 '13 at 23:19
  • You ought to edit your answer .... at the moment it deserves the -1 vote it got from someone else. The advice to read the docs for the various *apply functions is good but I would add that the `ave` function is also a core function. – IRTFM Aug 08 '13 at 23:55