1

I have time series data in the following structure:

dat=data.frame("Year"=rep(2005,31),
               "Day"=seq(1:31),
               "JANUARY"=sample(seq(1:100),31,T),
               "FEBRUARY"=c(sample(seq(1:100),28),NA,NA,NA),
               "MARCH"=sample(seq(1:100),31),
               "APRIL"=c(sample(seq(1:100),30),NA),
               "MAY"=sample(seq(1:100),31),
               "JUNE"=c(sample(seq(1:100),30),NA),
               "JULY"=sample(seq(1:100),31),
               "AUGUST"=sample(seq(1:100),31),
               "SEPTEMBER"=c(sample(seq(1:100),30),NA),
               "OCTOBER"=sample(seq(1:100),31),
               "NOVEMBER"=c(sample(seq(1:100),30),NA),
               "DECEMBER"=sample(seq(1:100),31)

Closest I can think is to melt the data by day and year

melt(dat,id.vars=c("Day","Year"))

coercing to a date

dat$Date<-paste(dat$Day,dat$variable,dat$Year,sep="-")
dat$Date<-as.Date(dat$Date,"%d-%B-%Y")
dat<-dat[which(is.na(pm25$Date)!=T),]

Is there a more efficient and non-stupid way of doing any of this?

slap-a-da-bias
  • 376
  • 1
  • 6
  • 25

1 Answers1

0

I took the Hadley approach using gather from tidyr, mutate from dplyr, str_c from stringr, and as_date from lubridate. It makes things flow pretty smoothly.

library('dplyr')
library('tidyr')
library('stringr')
library('lubridate')

Dates <- dat %>% 
  gather(Month, Value, JANUARY:DECEMBER) %>% 
  mutate(Date_1 = str_c(Day, Month, Year, sep = "-"),
         Date_2 = as_date(Date_1, "%d-%B-%Y")) %>% 
  filter(!is.na(Date_2))
rawr
  • 20,481
  • 4
  • 44
  • 78
kputschko
  • 766
  • 1
  • 7
  • 21