2

This is a sample of my data set:

   day city count
1   1    A    50
2   2    A   100
3   2    B   110
4   2    C    90

Here is the code for reproducing it:

  df <- data.frame(
    day = c(1,2,2,2),
    city = c("A","A","B","C"),
    count = c(50,100,110,90)
    )

As you could see, the count data is missing for city B and C on the day 1. What I want to do is to use city A's count as an estimate for the other two cities. So the desired output would be:

   day city count
1   1    A    50
2   1    B    50
3   1    C    50
4   2    A   100
5   2    B   110
6   2    C    90

I could come up with a for loop to do it, but I feel like there should be an easier way of doing it. My idea is to count the number of observations for each day, and then for the days that the number of observations is less than the number of cities in the data set, I would replicate the row to complete the data for that day. Any better ideas? or a more efficient for-loop? Thanks.

Milad
  • 63
  • 1
  • 7

1 Answers1

2

With dplyr and tidyr, we can do:

library(dplyr)
library(tidyr)

df %>% 
  expand(day, city) %>% 
  left_join(df) %>% 
  group_by(day) %>% 
  fill(count, .direction = "up") %>% 
  fill(count, .direction = "down")

Alternatively, we can avoid the left_join using thelatemail's solution:

df %>% 
  complete(day, city) %>% 
  group_by(day) %>% 
  fill(count, .direction = "up") %>% 
  fill(count, .direction = "down")

Both return:

# A tibble: 6 x 3
    day city  count
  <dbl> <fct> <dbl>
1    1. A       50.
2    1. B       50.
3    1. C       50.
4    2. A      100.
5    2. B      110.
6    2. C       90.

Data (slightly modified to show .direction filling both directions):

df <- data.frame(
  day = c(1,2,2,2),
  city = c("B","A","B","C"),
  count = c(50,100,110,90)
)
tyluRp
  • 4,678
  • 2
  • 17
  • 36
  • There's also `complete` so you could do - `complete(df, day, city) %>% fill(count)` – thelatemail Mar 09 '18 at 00:57
  • Nice, I like that one – tyluRp Mar 09 '18 at 00:58
  • 1
    Feel free to steal it if you like - no point having 2 nearly identical answers. – thelatemail Mar 09 '18 at 00:59
  • Thanks! Do I have to sort the data frame based on day and city before running this? – Milad Mar 09 '18 at 01:05
  • Hm, not sure. You might have to but I cannot test as I'm just about to leave for work. You might try `df %>% group_by(day) %>% arrange(city) %>% complete(day, city) %>% fill(count)` if sorting is a requirement. – tyluRp Mar 09 '18 at 01:09
  • @tyluRp I tested your solution and I think it has one issue. It looks like that `complete` output has an alphabetical order and if it's city A which is missing, `fill` cannot find a value for it as it comes on the top of the list. – Milad Mar 09 '18 at 15:40
  • We can solve this issue by using the additional argument `.direction` in the `fill` call. See my updated answer and let me know if that works for you. – tyluRp Mar 10 '18 at 00:36