R: How to fill yearly data within monthly data?

Question

I'm trying to load data from Quandl with collapse = "monthly". Some of the values are only available in a yearly or halve-yearly fashion. Some other values are only available within certain periods of time. This leaves me with a lot of inhomogeneous data. How can I fill the yearly and halve-yearly data in a "Last observation carried forward" fashion and the remaining NAs with 0?

Here is my idea of the data I got and the one I want to have at the end:

library(tibble)

set.seed(4711)

# How do I get from:
#
df.start <- data_frame(
  Date = seq.Date(as.Date("1990-01-01"), as.Date("1999-12-01"), "1 month"),
  B = rep(NA, 120),
  C = c(rep(NA, 50), rnorm(120 - 50)),
  D = rep(c(rnorm(1), rep(NA, 11)), 10),
  E = c(rep(NA, 24), rep(c(rnorm(1), rep(NA, 11)), 8)),
  F = c(rep(NA, 45), rnorm(50), rep(NA, 25)),
  G = c(rep(NA, 24), rep(c(rnorm(1), rep(NA, 11)), 6), rep(NA, 24)),
  H = c(rep(NA, 10), rnorm(20), rep(NA, 16), rnorm(37), rep(NA, 37)),
  I = rep(c(rnorm(1), rep(NA, 5)), 20)
)
#
# To:
#
df.end <- data_frame(
  Date = seq.Date(as.Date("1990-01-01"), as.Date("1999-12-01"), "1 month"),
  B = rep(0, 120),
  C = c(rep(0, 50), rnorm(120 - 50)),
  D = rep(rnorm(10), each = 12),
  E = c(rep(0, 24), rep(rnorm(8), each = 12)),
  F = c(rep(0, 45), rnorm(50), rep(0, 25)),
  G = c(rep(0, 24), rep(rnorm(6), each = 12), rep(0, 24)),
  H = c(rep(0, 10), rnorm(20), rep(0, 16), rnorm(37), rep(0, 37)),
  I = rep(rnorm(20), each = 6)
)
#
# Automatically?
#

Yeah sure, but it only works for some of the columns, while I'm looking for a solution that works for every column. Something like `df.end <- mutate_all(df.start ,function(x){#TODO})`. — Someone2, Dec 06 '18 at 11:14
So since whether to fill down with the last non-NA entry depends on `Date`, you should clearly explain this dependence so that we don't need to analyze your example. — Julius Vainora, Dec 06 '18 at 11:55
It depends on the type of data, if it is monthly (use `na.replace`) or yearly (use `na.locf`), not on the `Date`. — Someone2, Dec 06 '18 at 12:03

score 0 · Accepted Answer · edited Feb 01 '19 at 17:19

0

You can use fill to fill the NAs with the last non-empty value (except for the Date column), and then replace the remaining NAs by 0. We do these operations grouped by year.

library(tidyverse)
library(lubridate)

df.end <- df.start %>%
  mutate(year = year(Date)) %>%
  group_by(year) %>%
  fill(., colnames(df.start[-1])) %>%
  replace(., is.na(.), 0) %>%
  ungroup() %>%
  select(-year)

edited Feb 01 '19 at 17:19

marc_s

732,580
175
1,330
1,459

answered Dec 06 '18 at 12:49

NRLP

568
3
16

Thank you for your answer, but what about the last values of column H and I? They are also filled. – Someone2 Dec 06 '18 at 12:56
@Someone2 They get filled out as expected: with the last non-empty value from the respective columns. Unless I misunderstood your question: did you expect them to be filled differently? – NRLP Dec 06 '18 at 12:59
Ideally they should contain a zero, like in `df.end`. Since this is the hard part about it. Maybe using a gateway to fill the inner part differently would work. – Someone2 Dec 06 '18 at 13:02
@Someone2 Hmm.. I am not sure now I understand the question. I checked your `df.end`, and I think I do not understand the filling pattern. For instance, on column `D` the first value in `df.start` is `-1.712529`, but in`df.end` it is filled out by `0.02222674`. Where did `0.02222674` come from? – NRLP Dec 06 '18 at 13:08
Yeah this is an error of mine due to the randomness, it should be the same value. I just difference between a value, a `NA` and a zero. Where the values of `df.start` and `df.end` should be the same. – Someone2 Dec 06 '18 at 13:11
1

@Someone2 I think I figured it out. The values should be grouped by year. Please see my updated answer. – NRLP Dec 06 '18 at 13:30

R: How to fill yearly data within monthly data?

1 Answers1