3

My data I acquired were missing some parts.

DAY<-c("2011-01-01","2011-01-02","2011-01-04","2011-01-06")
ITEM<-c("apple","apple","apple","banana")
sale<-c("yes","no","yes","yes")
value<-c(100,200,100,500)

df <- data.frame(day=DAY,item=ITEM,sale=sale,value=value)


         day   item sale value
1 2011-01-01  apple  yes   100
2 2011-01-02  apple   no   200
3 2011-01-04  apple  yes   100
4 2011-01-06 banana  yes   500

↑ this is my original data. but I want following expand dataframe


bind_row=NULL
bind=NULL
for(h in 1:length(unique(df$day))){
  bind_day=as.character(unique(df$day)[h])

  for(i in 1:length(unique(df$item))){
    bind_item=as.character(unique(df$item)[i])
  for(j in 1:length(unique(df$sale))){
    bind_sale=as.character(unique(df$sale)[j])
    bind=c(bind_day,bind_item,bind_sale)
    bind_row=rbind(bind_row,bind)
  }
}
}

bind_row <- cbind(bind_row,c(100,0,0,0,0,200,0,0,100,0,0,0,0,0,500,0))


bind "2011-01-01" "apple"  "yes" "100"
bind "2011-01-01" "apple"  "no"  "0"  
bind "2011-01-01" "banana" "yes" "0"  
bind "2011-01-01" "banana" "no"  "0"  
bind "2011-01-02" "apple"  "yes" "0"  
bind "2011-01-02" "apple"  "no"  "200"
bind "2011-01-02" "banana" "yes" "0"  
bind "2011-01-02" "banana" "no"  "0"  
bind "2011-01-04" "apple"  "yes" "100"
bind "2011-01-04" "apple"  "no"  "0"  
bind "2011-01-04" "banana" "yes" "0"  
bind "2011-01-04" "banana" "no"  "0"  
bind "2011-01-06" "apple"  "yes" "0"  
bind "2011-01-06" "apple"  "no"  "0"  
bind "2011-01-06" "banana" "yes" "500"
bind "2011-01-06" "banana" "no"  "0"  

how can I make(transform from original data to expand data) this dataframe in R ? (or python ?)

this script soooo slow. could you help me ?

thank you.

h-y-jp
  • 199
  • 1
  • 8

2 Answers2

2

We can use complete from tidyr in R.

tidyr::complete(df, day, item, sale, fill = list(value = 0))

#    day        item   sale  value
#   <fct>      <fct>  <fct> <dbl>
# 1 2011-01-01 apple  no        0
# 2 2011-01-01 apple  yes     100
# 3 2011-01-01 banana no        0
# 4 2011-01-01 banana yes       0
# 5 2011-01-02 apple  no      200
# 6 2011-01-02 apple  yes       0
# 7 2011-01-02 banana no        0
# 8 2011-01-02 banana yes       0
# 9 2011-01-04 apple  no        0
#10 2011-01-04 apple  yes     100
#11 2011-01-04 banana no        0
#12 2011-01-04 banana yes       0
#13 2011-01-06 apple  no        0
#14 2011-01-06 apple  yes       0
#15 2011-01-06 banana no        0
#16 2011-01-06 banana yes     500
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

The proposed solution fills in the rows where there are existing days in the data set, but not where there are days with no data. For that you need the tsibble package.

library(tidyverse)
library(tsibble)

DAY <- c("2011-01-01", "2011-01-02", "2011-01-04", "2011-01-06")
ITEM <- c("apple", "apple", "apple", "banana")
sale <- c("yes", "no", "yes", "yes")
value <- c(100, 200, 100, 500)

df <- data.frame(day = as.Date(DAY), item = ITEM, sale = sale, value = value)

df %>%
  complete(day, item, sale, fill=list(value=0)) %>%
  as_tsibble(index=day, key=c(item,sale)) %>%
  fill_gaps(value=0)
#> # A tsibble: 24 x 4 [1D]
#> # Key:       item, sale [4]
#>    day        item  sale  value
#>    <date>     <fct> <fct> <dbl>
#>  1 2011-01-01 apple no        0
#>  2 2011-01-02 apple no      200
#>  3 2011-01-03 apple no        0
#>  4 2011-01-04 apple no        0
#>  5 2011-01-05 apple no        0
#>  6 2011-01-06 apple no        0
#>  7 2011-01-01 apple yes     100
#>  8 2011-01-02 apple yes       0
#>  9 2011-01-03 apple yes       0
#> 10 2011-01-04 apple yes     100
#> # … with 14 more rows

Created on 2020-04-01 by the reprex package (v0.3.0)

Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85