1

I'm learning R and currently trying to fill in a data frame with missing dates and NA values.

Data sample:

Date <- c("23-01-19", "24-01-19", "25-01-19",  "30-01-19", "31-01-19" )
Open <- c("69.849998", "69.440002", "69.540001", "70.32", "69.559998")
High <- c("69.849998", "69.440002", "69.540001", "70.32", "69.559998")
Low <- c("69.849998", "69.440002", "69.540001", "70.32", "69.559998")
Close <- c("69.849998", "69.440002", "69.540001", "70.32", "69.559998")
Adj_Close <- c("69.849998", "69.440002", "69.540001", "70.32", "69.559998")
Volume <- c("0", "0", "0", "0","0")

InvescoDf <- data.frame(Date, Open, High, Low, Close, Adj_Close, Volume)

I'm trying that:

library(tidyverse)
library(zoo)

df <- InvescoDf
df$Date <- as.Date(df$Date, "%d-%m-%y")                    # assign as Date
df$Date<-as.POSIXlt(df$Date,format="%Y-%m-%d")             # assign as POSIXlt
df1.zoo<-zoo(df[,-1],df[,1])                               # assign Date as index
df2.zoo<-zoo(,seq(start(df1.zoo),end(df1.zoo),by="day"))   # create data sequence
df2 <- merge(df1.zoo,df2.zoo, all=TRUE)                    # merge 

Error : Warning message: In merge.zoo(df1.zoo, df2.zoo, all = TRUE) : Index vectors are of different classes: POSIXlt POSIXct

Apparently seq() creates a POSIXct, but I only need the days not the hours. I don't really understand the zoo object, probably there is the mistake. Please help and tell me what further information you need.

EDIT:

Now I'm trying to iterate through multiple dfs, can someone help?

OssiamDf <- InvescoDf
new_list <- list(InvescoDf, OssiamDf)
new_list <- lapply(new_list, function(dat) {
# change all to date
  dat[[1]] <- as.Date(dat3[[1]], "%d-%m-%y")
# change the other variables to num 
  dat[-1] <- lapply(dat[-1],  function(x) as.numeric(as.character(x)))
# complete the dates?
  dat[[1]] <- lapply(dat[[1]], complete(dat[[1]], 
Date = seq(min(dat[[1]]), max(dat[[1]]), by = "day")))
  dat
})

I don't know how to put the complete state into the lapply Pls help.

ToniR
  • 37
  • 5

1 Answers1

2

You can convert Date to date object and then use complete from tidyr to fill missing dates.

library(dplyr)
library(tidyr)

InvescoDf %>%
  mutate(Date = as.Date(Date, "%d-%m-%y")) %>%
  complete(Date = seq(min(Date), max(Date), by = "day"))

#  Date        Open  High   Low Close Adj_Close Volume
#  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>  <int>
#1 2019-01-23  69.8  69.8  69.8  69.8      69.8      0
#2 2019-01-24  69.4  69.4  69.4  69.4      69.4      0
#3 2019-01-25  69.5  69.5  69.5  69.5      69.5      0
#4 2019-01-26  NA    NA    NA    NA        NA       NA
#5 2019-01-27  NA    NA    NA    NA        NA       NA
#6 2019-01-28  NA    NA    NA    NA        NA       NA
#7 2019-01-29  NA    NA    NA    NA        NA       NA
#8 2019-01-30  70.3  70.3  70.3  70.3      70.3      0
#9 2019-01-31  69.6  69.6  69.6  69.6      69.6      0

To do this for multiple dataframes in a list, we can do

new_list <- lapply(new_list, function(dat) {
    dat[[1]] <- as.Date(dat[[1]], "%d-%m-%y")
    # change the other variables to num 
    dat[-1] <- lapply(dat[-1],  function(x) as.numeric(as.character(x)))
    # complete the dates?
    dat <- complete(dat, Date = seq(min(Date), max(Date), by = "day"))
    #OR
    #dat <- complete(dat, Date = seq(min(dat[[1]]), max(dat[[1]]), by = "day"))
    dat
 })

data

InvescoDf <- type.convert(InvescoDf)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much! Can I iterate it though multiple data frames in a list with lapply? – ToniR Jan 26 '20 at 14:40
  • @ToniR you mean, you have multiple such dataframes and you want to use the same code for all of them? Yes, you can use `lapply` for that. – Ronak Shah Jan 26 '20 at 14:44
  • It appears unclear to me, that you are using mutate to change the variable, but I thought mutate creates a new variable. Can you help me understand? – ToniR Jan 26 '20 at 14:55
  • 1
    `mutate` creates a new variable if it is not present in the data If the variable is already present it changes the variable. – Ronak Shah Jan 26 '20 at 15:00
  • I edited my iterattion problem. Maybe you can help again. Thx. – ToniR Jan 26 '20 at 15:28
  • 1
    @ToniR Updated the answer to make this work on a list. Please check. – Ronak Shah Jan 26 '20 at 15:34