1

I have monthly observed and modeled data organized as vectors:

obs <- structure(c(68.72228685137, 68.4565130874024, 68.3237563140977, 
66.1789683147099, 63.7162100107148, 59.9698454002755), .Names = c("X1901.01.01", 
"X1901.02.01", "X1901.03.01", "X1901.04.01", "X1901.05.01", "X1901.06.01"
))

mod <- structure(c(71.5796750030741, 71.5925210418478, 70.8672045288309, 
67.9705857323206, 68.462614970737, 67.7095309202574), .Names = c("X1899.11.01", 
"X1899.12.01", "X1901.01.01", "X1901.02.01", "X1901.03.01", "X1901.04.01"
))

where X1901.01.01 corresponds to 1901-01-01 and so on. Please note that dates in observed and modeled data don't overlap completely.

This is just a sample - my real data contains thousands of observations.

What is the most efficient (i.e. fastest) way to combine these vectors in a data frame assigning NA to non-matching dates and getting rid of the infamous "X" in the front of the original dates?

This would be the resulting data frame:

   date         obs             mod
1899.11.01      NA              71.57968
1899.12.01      NA              71.59252
1901.01.01      68.72229        70.86720    
1901.02.01      68.45651        67.97059
1901.03.01      68.32376        68.46261    
1901.04.01      66.17897        67.70953    
1901.05.01      63.71621            NA
1901.06.01      59.96985            NA
Alex A.
  • 5,466
  • 4
  • 26
  • 56
thiagoveloso
  • 2,537
  • 3
  • 28
  • 57

2 Answers2

3

While @Alex A.'s answer works, since it's date/time data it might be beneficial to treat it this way from the beginning. You can easily merge these using the merge() function with the all=TRUE flag set, which will merge on any identical column names:

obs <- as.data.frame(obs)
mod <- as.data.frame(mod)
obs[["date"]] <- as.Date(row.names(obs), "X%Y.%m.%d")
mod[["date"]] <- as.Date(row.names(mod), "X%Y.%m.%d")

d <- merge(obs, mod, all=TRUE)

Since the date columns are date/time class you could then easily convert the data.frame to an xts time series or something else for subsetting, summarizing, etc.

Forrest R. Stevens
  • 3,435
  • 13
  • 21
1

First transpose your data frames from "wide" to "long" format.

library(reshape2)

m.obs <- melt(obs, id.vars = NULL)
m.mod <- melt(mod, id.vars = NULL)

After transposing, the dates are now the row names. You can then merge on the row names.

both <- merge(m.obs, m.mod, by = "row.names", all = TRUE)

Set column names however you want.

colnames(both) <- c("date", "obs", "mod")

Now you can remove the "X" using gsub.

both <- transform(both, date = gsub("X", "", date))

Any values present in one data frame and not the other will be assigned NA when the data frames are merged with all = TRUE.

Alex A.
  • 5,466
  • 4
  • 26
  • 56