I am trying to complete a dataframe which has gaps. The data is structured hourly and there are some jumps between hours. This is a sample of the dataframe:
sms <- read.csv("hoursms2.csv", sep = ";", header = FALSE)
sms$date <- strptime(sms$date,"%d/%m/%Y %H:%M")
date count
2015-12-17 22:00:00 5
2015-12-18 09:00:00 7
2015-12-18 10:00:00 3
2015-12-18 12:00:00 6
What I've tried is to create another dataframe which ranges from the beginning to the end of the first dataframe, but with all the hours and with a unique identifier to keep track of it:
hh <- seq(min(sms$date), max(sms$date), by="hour")
hh <- as.data.frame(hh)
hh <- cbind(hh,seq(1:3292)) #length of the hh data frame
Which gives the following:
date id
2015-12-04 12:00:00 1
2015-12-04 13:00:00 2
2015-12-04 14:00:00 3
2015-12-04 15:00:00 4
2015-12-04 16:00:00 5
So far I have tried these three things I found on StackOverflow, here ,but they do not work in this case:
res <- merge(sms,hh,by = "date", all = TRUE)
res2 <- res[duplicated(res$date), ]
res3 <- res[!duplicated(res), ]
The first commmand, res, what does is to duplicate all the cells, it doesn't merge it by date, so it gives the following:
2015-12-23 19:00:00 5
2015-12-23 19:00:00 NA
The other screw up everything. I think it has something to do with the date format which is not correctly associated, but not sure. Any suggestion of how to complete by dates my data frame please?