0

I've got panel data and have been playing around with k-means clustering. So now I've got a panel of factor values that are mostly stable but I'd like to smooth that out a bit more so that (for example) the data says "Wyoming was in group 1 in earlier years, moved into group 2, then moved into group 5" rather than "Wyoming was in group 1,1,1,2,3,2,2,5,5,5".

So the approach I'm taking is to use rollapply() to calculate the modal value. Below is code that works to calculate the mode ("Mode()"), and a wrapper for that ("ModeR()") that (perhaps clumsily) resolves the problem of multi-modal windows by randomly picking a mode. All that is fine, but when I put it into rollapply() I'm getting problems.

Mode <- function(vect){ # take a vector as input
  temp <- as.data.frame(table(vect)) 
  temp <- arrange(temp,desc(Freq)) # from dplyr
  max.f <- temp[1,2]
  temp <- filter(temp,Freq==max.f) # cut out anything that isn't modal
  return(temp[,1])
}
ModeR <- function(vect){
  out <- Mode(vect)
  return(out[round(runif(1,min=0.5000001,max=length(out)+0.499999999))])
}
temp <- round(runif(20,min=1,max=10)) # A vector to test this out on.
cbind(temp,rollapply(data=temp,width=5,FUN=ModeR,fill=NA,align="right"))

which returned:

      temp   
 [1,]    5 NA
 [2,]    6 NA
 [3,]    5 NA
 [4,]    5 NA
 [5,]    7  1
 [6,]    6  1
 [7,]    5  1
 [8,]    5  1
 [9,]    3  2
[10,]    1  3
[11,]    5  3
[12,]    7  3
[13,]    5  3
[14,]    4  3
[15,]    3  3
[16,]    4  2
[17,]    8  2
[18,]    5  2
[19,]    6  3
[20,]    6  3

Compare that with:

> ModeR(temp[1:5])
[1] 5
Levels: 5 6 7
> ModeR(temp[2:6])
[1] 6
Levels: 5 6 7

So it seems like the problem is in how ModeR is being applied in rollapply(). Any ideas?

Thanks! Rick

Rick_Weber
  • 26
  • 1
  • 4

1 Answers1

0

Thanks to /u/murgs! His comment pointed me in the right direction (in addition to helping me streamline ModeR() using sample()).

ModeR() as written above returns a factor (as does Mode()). I need it to be a number. I can fix this by updating my code as follows:

Mode <- function(vect){ # take a vector as input
  temp <- as.data.frame(table(vect)) 
  temp <- arrange(temp,desc(Freq))
  max.f <- temp[1,2]
  temp <- filter(temp,Freq==max.f) # cut out anything that isn't modal
  return(as.numeric(as.character(temp[,1]))) #HERE'S THE BIG CHANGE
}
ModeR <- function(vect){
  out <- Mode(vect)
  return(out[sample(1:length(out),1)]) #HERE'S SOME IMPROVED CODE!
}

Now rollapply() does what I expected it to do! There's still that weird as.character() bit (otherwise it rounds down the number). I'm not sure what's going on there, but the code works so I won't worry about it...

Rick_Weber
  • 26
  • 1
  • 4