Original Request
The following is an option. It uses full_join
, and then the fill
function to impute the missing value.
library(tidyverse)
DB_final <- DB %>%
full_join(Hist, by = "Date") %>%
arrange(Date) %>%
fill(Index, .direction = "up") %>%
filter(!is.na(Value))
DB_final
# Value Date Index
# 1 20 2017-10-19 13.517,98
# 2 19 2017-10-23 13.404,58
# 3 19 2017-11-03 13.378,96
# 4 20 2017-11-10 13.206,35
However, the user needs to know the fill direction (up
or down
) in advance. It may not be useful if the user does not know that.
Impute Missing Value based on the Nearest Date
Here is another option, which I think is more robust. It will impute the missing value use the Index
from the nearest date.
Step 1: Find the Nearest Date
# Collect all dates
Date_vec <- sort(unique(c(DB$Date, Hist$Date)))
# Create a distance matrix based on dates than convert to a data frame
dt <- Date_vec %>%
dist() %>%
as.matrix() %>%
as.data.frame() %>%
rowid_to_column(var = "ID") %>%
gather(ID2, Value, -ID) %>%
mutate(ID2 = as.integer(ID2)) %>%
filter(ID != ID2) %>%
arrange(ID, Value) %>%
group_by(ID) %>%
slice(1) %>%
select(-Value)
dt$ID <- Date_vec[dt$ID]
dt$ID2 <- Date_vec[dt$ID2]
names(dt) <- c("Date1", "Date2")
dt
# # A tibble: 5 x 2
# # Groups: ID [5]
# Date1 Date2
# <date> <date>
# 1 2017-10-19 2017-10-23
# 2 2017-10-23 2017-10-25
# 3 2017-10-25 2017-10-23
# 4 2017-11-03 2017-11-10
# 5 2017-11-10 2017-11-03
dt
shows the nearest date of all the dates.
Step 2: Perform multiple join
Join DB
and dt
, and then join Hist
twice based on different date columns.
DB2 <- DB %>% left_join(dt, by = c("Date" = "Date1"))
DB3 <- DB2 %>%
left_join(Hist, by = "Date") %>%
left_join(Hist, by = c("Date2" = "Date"))
DB3
# Value Date Date2 Index.x Index.y
# 1 20 2017-10-19 2017-10-23 13.517,98 <NA>
# 2 19 2017-10-23 2017-10-25 <NA> 13.404,58
# 3 19 2017-11-03 2017-11-10 13.378,96 13.206,35
# 4 20 2017-11-10 2017-11-03 13.206,35 13.378,96
Step 3: Finalize the Index
If there are values in Index.x
, use that, otherwise, use the values in Index.y
.
DB4 <- DB3 %>%
mutate(Index = ifelse(is.na(Index.x), Index.y, Index.x)) %>%
select(Value, Date, Index)
DB4
# Value Date Index
# 1 20 2017-10-19 13.517,98
# 2 19 2017-10-23 13.404,58
# 3 19 2017-11-03 13.378,96
# 4 20 2017-11-10 13.206,35
DB4
is the final output.
DATA
DB <- structure(list(Value = c(20L, 19L, 19L, 20L), Date = structure(c(17458,
17462, 17473, 17480), class = "Date")), class = "data.frame", .Names = c("Value",
"Date"), row.names = c(NA, -4L))
Hist <- structure(list(Date = structure(c(17480, 17473, 17464, 17458), class = "Date"),
Index = c("13.206,35", "13.378,96", "13.404,58", "13.517,98"
)), class = "data.frame", .Names = c("Date", "Index"), row.names = c(NA,
-4L))