6

The format of my excel data file is:

 day                 value
 01-01-2000 00:00:00    4
 01-01-2000 00:01:00    3
 01-01-2000 00:02:00    1
 01-01-2000 00:04:00    1

I open my file with this:

ts = read.csv(file=pathfile, header=TRUE, sep=",")

How can I add additional rows with zero number in column “value” into the data frame. Output example:

 day                  value
 01-01-2000 00:00:00    4
 01-01-2000 00:01:00    3
 01-01-2000 00:02:00    1
 01-01-2000 00:03:00    0
 01-01-2000 00:04:00    1
Thomas
  • 43,637
  • 12
  • 109
  • 140

4 Answers4

13

This is now completely automated in the padr package. Takes only one line of code.

original <- data.frame(
  day = as.POSIXct(c("01-01-2000 00:00:00",
                     "01-01-2000 00:01:00",
                     "01-01-2000 00:02:00",
                     "01-01-2000 00:04:00"), format="%m-%d-%Y %H:%M:%S"),
  value = c(4, 3, 1, 1))

library(padr)
library(dplyr) # for the pipe operator
original %>% pad %>% fill_by_value(value)

See vignette("padr") or this blog post for its working.

Edwin
  • 3,184
  • 1
  • 23
  • 25
  • It only works when there's a variable of class `Date`, `POSIXct`, or `POSIXlt` in the data. If the time dimension is `int`, can it work too? – Rafs Jul 31 '20 at 14:33
  • 1
    There is the function `padr::pad_int` for that. – Edwin Aug 03 '20 at 14:31
3

I think this is a more general solution, which relies on creating a sequence of all timestamps, using that as the basis for a new data frame, and then filling in your original values in that df where applicable.

# convert original `day` to POSIX
ts$day <- as.POSIXct(ts$day, format="%m-%d-%Y %H:%M:%S", tz="GMT")

# generate a sequence of all minutes in a day
minAsNumeric <- 946684860 + seq(0,60*60*24,by=60) # all minutes of your first day
minAsPOSIX <- as.POSIXct(minAsNumeric, origin="1970-01-01", tz="GMT") # convert those minutes to POSIX

# build complete dataframe
newdata <- as.data.frame(minAsPOSIX)
newdata$value <- ts$value[pmatch(newdata$minAsPOSIX, ts$day)] # fill in original `value`s where present
newdata$value[is.na(newdata$value)] <- 0 # replace NAs with 0
Thomas
  • 43,637
  • 12
  • 109
  • 140
1

Try:

ts = read.csv(file=pathfile, header=TRUE, sep=",", stringsAsFactors=F)
ts.tmp = rbind(ts,list("01-01-2000 00:03:00",0))
ts.out = ts.tmp[order(ts.tmp$day),]

Notice that you need to force load the strings in first column as character and not factors otherwise you will have issue with the rbind. To get the day column to be a factor after than just do:

ts.out$day = as.factor(ts.out$day)
Valentin Ruano
  • 2,726
  • 19
  • 29
0

Tidyr offers the nice complete function to generate rows for implicitly missing data. I use replace_na to turn NA values to 0 in second step.

ts%>%
  tidyr::complete(day=seq.POSIXt(min(day), max(day), by="min"))%>%
  dplyr::mutate(value=tidyr::replace_na(value,0))

Notice that I set the granularity of the dates to minutes since your dataset expects a row every minute.

schmitzi89
  • 47
  • 7