0

I have a dataframe where one column is the date yyyy-mm-dd, however it occasionally skips over dates, leaving days, months or even years missing. How can I create rows where there are missing dates and fill the empty data slots with NA? I have tried the solution offered here but I can't seem to get it to work. My dataframe looks like:

Data <- data.frame( X = c(1:10), 
                    Y = seq.Date(as.Date("1870-01-01"),as.Date("1870-01-10"),"day"),
                    Z = rnorm(10,10,5), A = c(10:1), B = c(109:100))
Datamissing <- Data[-c(2,4,7:10),]
Datamissing

#  X          Y         Z  A   B
#1 1 1870-01-01  3.581199 10 109
#3 3 1870-01-03  6.158980  8 107
#5 5 1870-01-05 10.259756  6 105
#6 6 1870-01-06  3.077985  5 104

And the output I want is:

Data[c(2,4,7:10),c(3:5)] <- NA
Datafilled <- Data
Datafilled

#    X          Y         Z  A   B
#1   1 1870-01-01  3.581199 10 109
#2   2 1870-01-02        NA NA  NA
#3   3 1870-01-03  6.158980  8 107
#4   4 1870-01-04        NA NA  NA
#5   5 1870-01-05 10.259756  6 105
#6   6 1870-01-06  3.077985  5 104
#7   7 1870-01-07        NA NA  NA
#8   8 1870-01-08        NA NA  NA
#9   9 1870-01-09        NA NA  NA
#10 10 1870-01-10        NA NA  NA

Additionally I have a bunch of these dataframes in a list. So far I have a data.frame with a list of the start and end dates, I apply it to the new dataframe:

    days <- seq.Date(as.Date("1870-01-01"),as.Date("2016-01-01"),"day")
    QCdata <- as.data.frame(days)

Then I do something like:

    QCdata[,3] <- data_list[[1]][,3][pmatch(QCdata[,1], data_list[[1]][,3])]

Where data_list is just the list that contains all my dataframes.

Community
  • 1
  • 1

1 Answers1

0

You can do a self-join with data.table

library(data.table)
setDT(Datamissing, key = "Y")

Datamissing[.(seq.Date(as.Date("1870-01-01"),
                       as.Date("1870-01-10"),"day"))]
#     X          Y          Z  A   B
# 1:  1 1870-01-01 12.9712691 10 109
# 2: NA 1870-01-02         NA NA  NA
# 3:  3 1870-01-03 14.0270033  8 107
# 4: NA 1870-01-04         NA NA  NA
# 5:  5 1870-01-05  1.3654924  6 105
# 6:  6 1870-01-06  0.6544578  5 104
# 7: NA 1870-01-07         NA NA  NA
# 8: NA 1870-01-08         NA NA  NA
# 9: NA 1870-01-09         NA NA  NA
#10: NA 1870-01-10         NA NA  NA

(I'm leaving X as NA as well because you've not specified that in fact X is tied one-for-one with Y)

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • I don't understand how this solves my problem, or how I can adapt the code to fit my situation I'm sorry. – Jominic Dones Feb 09 '16 at 02:27
  • Here is my MRE: Data <- data.frame( X = c(1:10), Y = seq.Date(as.Date("1870-01-01"),as.Date("1870-01-10"),"day"), Z = rnorm(10,10,5), A = c(10:1), B = c(109:100)) Datamissing <- Data[-c(2,4,7:10),] #Datamissing is what is what I have. What I want is: Datafilled <- Data Datafilled[c(2,4,7:10),c(3:5)] <- NA #Datafilled. There are extra columns that I want to keep but otherwise don't do anything. Does that make sense? Thanks. – Jominic Dones Feb 09 '16 at 02:59
  • @JominicDones - i've added your MRE to your question, which is where it should be. – thelatemail Feb 09 '16 at 03:18
  • @JominicDones I've updated my answer to fit your example. – MichaelChirico Feb 09 '16 at 03:32