-2

I have multiple observation of rainfall for the same station for around 14 years the data frame is in something like this :

df (from date -01/01/2000)

v1  v2 v3 v4 v5 v6 ........ v20
1   1  2 4   8  9.............. 
1.4 4  3.8..................
1.5 3  1.6....................
1.6 8  .....................
.
.
.
.

till date 31/01/2013 i.e total 5114 observations

where v1 v2 ...v20 are the rainfall simulation for the same point; I want to plot the box plot which represents the collective range of quantiles and median monthly when all the observations are taken together.

I can plot box plot for single monthly values using :

df$month<-factor(month.name,levels=month.name)
library(reshape2)
df.long<-melt(df,id.vars="month")
ggplot(df.long,aes(month,value))+geom_boxplot()

but in this problem as the data is daily and there are multiple observations i don't get idea where to start.

sample data

df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))

In case if u want to work with a zoo object :

date<-seq(as.POSIXct("2000-01-01 00:00:00","GMT"),as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min") 

If you want yo can also convert it to zoo object

x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
jazzurro
  • 23,179
  • 35
  • 66
  • 76
saurabh
  • 391
  • 1
  • 2
  • 11
  • 1
    Can you `dput()` some minimal example of `df` so people who want to answer don't have to generate it? – ilir Sep 30 '14 at 10:47
  • hey @ilir the sample dataset is : df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100)) where each column represents a simulation and rows represent a daily data.thanks for the suggestion. – saurabh Sep 30 '14 at 11:06
  • this is a good question that i posted ; and i am sure many people who are doing simulations will find it useful , kindly someone try to find the solution . thankyou so much. – saurabh Sep 30 '14 at 11:24
  • Would you consider to provide a better sample data? For example, what does each column contain? You would probably say rain amount. But, which year? Which month? You are expecting just `melt()`, but there may be more things to do to arrange your data set. – jazzurro Sep 30 '14 at 11:39
  • @user197393 Make some effort and edit question properly. If I literally take your example then after creating `df` i got error `replacement has 12 rows, data has 5114`. – Marek Sep 30 '14 at 11:40
  • the column for date would be date<-seq(as.POSIXct("2000-01-01 00:00:00","GMT"),as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min")), if you want yo can also convert it to zoo object by x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min")) – saurabh Sep 30 '14 at 11:42
  • @Marek taht example u tried is not relevant for the data.frame i posted it was only apt for monthly data with single observation.the challenge with the data grame i provided is that it has 100 observations and daily records . – saurabh Sep 30 '14 at 11:46
  • @jazzurro please see the data.frame after changing it to "zoo" format time series. thanks – saurabh Sep 30 '14 at 11:49
  • I understand you are doing your best. But, would it be possible for you to arrange a data frame using the date and matrix? Or would it be possible for you to provide the zoo format data? I also wonder if you really need this much data here. – jazzurro Sep 30 '14 at 11:54
  • @jazzurro the zoo data will be df <- zoo(data.frame(matrix(rnorm(20), nrow=5114,ncol=100)), order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min")) – saurabh Sep 30 '14 at 12:01
  • i require this much data here as monthly boxplot of all the observations taken collectively is the key thing i want by using R . @jazzurro. precisely i want to plot the range that all the observations taken together will cover by plotting the box plots and quantiles i want to compare it with observed value . – saurabh Sep 30 '14 at 12:04
  • @jazzurro please also find an alternate solution. – saurabh Oct 03 '14 at 10:44
  • good to see that your own solution. – jazzurro Oct 06 '14 at 03:49

2 Answers2

0

I am not familiar with zoo. So, I converted your sample to data frame. Your idea of using melt() is a right way. Then, you need to aggregate rain amount by month. I think it is good to look up aggregate() and other options. Here, I used dplyr and tidyr to arrange the sample data. I hope this will let you move forward.

### zoo to data frame by @ Joshua Ulrich
### http://stackoverflow.com/questions/14064097/r-convert-between-zoo-object-and-data-frame-results-inconsistent-for-different

zoo.to.data.frame <- function(x, index.name="Date") {
   stopifnot(is.zoo(x))
   xn <- if(is.null(dim(x))) deparse(substitute(x)) else colnames(x)
   setNames(data.frame(index(x), x, row.names=NULL), c(index.name,xn))
}

### to data frame
foo <- zoo.to.data.frame(df)
str(foo)

library(dplyr)
library(tidyr)

### wide to long data frame, aggregate rain amount by Date
ana <- foo %>%
    melt(., id.vars = "Date") %>%
    group_by(Date) %>%
    summarize(rain = sum(value))

### Aggregate rain amount by year and month
bob <- ana %>%
    separate(Date, c("year", "month", "date")) %>%
    group_by(year, month) %>%
    summarize(rain = sum(rain))

### Drawing a ggplot figure
ggplot(data = bob, aes(x = month, y = rain)) +
    geom_boxplot()
jazzurro
  • 23,179
  • 35
  • 66
  • 76
0

just found out an easier way to do it, hwoever your answered really helped jazzuro

    install.packages("reshape2")
    library(dplyr)
    library(reshape2)
    require(ggplot2)



   df = data.frame(matrix(rnorm(20), nrow=5114,ncol=100))
    x <- zoo(df, order.by=seq(as.POSIXct("2000-01-01 00:00:00","GMT"), 
                                     as.POSIXct("2013-12-31 00:00:00","GMT"), by="1440 min"))
    v<-aggregate(x, as.yearmon, mean)
     months<- rep(1:12,14)
     lol<-data.frame(v,months)

     df.m <- melt(lol, id.var = "months")
     View(df.m)
     p <- ggplot(df.m, aes(factor(months), value))
     p + geom_boxplot(aes(fill = months))
saurabh
  • 391
  • 1
  • 2
  • 11