0

I have a large dataset of irregular multivariate timeseries that I want to convert with read.zoo.

Some of the last rows are populated with NAs. When I run read.zoo including the rows with the NAs, I get the following error message: "index has bad entries at data rows: 43 44 ...".

When I check is.na() the NA cells indicate TRUE. And I tried the na.fill solution from here, but it doesn't work.

Below is an extract of the dataset with two variables Var1 and Var2 with their respective dates date1 and date2:

date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
2023-01-18 100.933 2023-01-16 99.762
2023-01-19 100.641 2023-01-17 99.484
2023-01-20 100.148 2023-01-18 99.743
2023-01-23 99.972 2023-01-19 99.419
2023-01-24 100.256 2023-01-20 99.364
2023-01-25 100.348 2023-01-23 99.533
2023-01-26 100.146 2023-01-24 99.711
2023-01-27 100.063 2023-01-25 99.798
2023-01-30 99.649 2023-01-26 100.481
2023-01-31 99.822 2023-01-27 100.708
2023-02-01 99.885 2023-01-30 100.57
2023-02-02 101.121 2023-01-31 100.773
2023-02-03 100.854 2023-02-01 100.999
2023-02-06 100.5 2023-02-02 102.037
2023-02-07 100.272 2023-02-03 102.104
2023-02-08 100.372 2023-02-06 101.85
2023-02-09 100.659 2023-02-07 101.765
2023-02-10 100.421 2023-02-08 101.806
2023-02-13 100.418 2023-02-09 101.905
2023-02-14 100.202 2023-02-10 101.675
2023-02-15 99.913 2023-02-13 101.491
2023-02-16 99.832 2023-02-14 101.304
2023-02-17 99.911 2023-02-15 101.242
2023-02-20 99.791 2023-02-16 101.621
2023-02-21 99.451 2023-02-17 101.581
2023-02-22 99.467 2023-02-20 101.545
2023-02-23 99.642 2023-02-21 101.334
2023-02-24 99.278 2023-02-22 101.246
2023-02-27 99.114 2023-02-23 101.857
2023-02-28 98.784 2023-02-24 101.71
2023-03-01 98.486 2023-02-27 101.759
2023-03-02 98.396 2023-02-28 101.649
2023-03-03 98.467 2023-03-01 101.583
2023-03-06 98.276 2023-03-02 101.426
2023-03-07 98.495 2023-03-03 101.666
2023-03-08 98.572 2023-03-06 101.919
2023-03-09 98.747 2023-03-07 102.048
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA
NA NA NA NA

Bertrand G
  • 37
  • 6
  • read.zoo is a wrapper for read.table - you may be better off with a multistep workstream to read in the data, remove the NA rows, then parse with zoo. Mildly annoying but should be relatively smooth to set up. – Paul Stafford Allen Apr 13 '23 at 11:02
  • In the original csv are there NAs at that position. If not what kind of data do you have there (e.g. date, numeric, character etc...) If you really have NAs there try to add `na.strings = "NA"` – TarJae Apr 13 '23 at 11:03
  • @PaulStaffordAllen Actually when I use zoo instead of read.zoo, I do not get the error message. But then, in the rest of my work, I get other issues with the dataset generated by zoo, which I do not have when using read.zoo. This is why I would be intrested in using read.zoo. – Bertrand G Apr 13 '23 at 12:32
  • @TarJae Same error with na.strings = "NA". – Bertrand G Apr 13 '23 at 12:34

3 Answers3

1

The solution was provided by @G. Grothendieck in another post here:

Replace as.data.frame(x) with na.omit(as.data.frame(x))

Bertrand G
  • 37
  • 6
0

first let me create a dataframe from your data:

lines <- "date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
2023-01-18 100.933 2023-01-16 99.762
2023-01-19 100.641 2023-01-17 99.484
2023-01-20 100.148 2023-01-18 99.743
2023-01-23 99.972 2023-01-19 99.419
2023-01-24 100.256 2023-01-20 99.364
2023-01-25 100.348 2023-01-23 99.533
2023-01-26 100.146 2023-01-24 99.711
2023-01-27 100.063 2023-01-25 99.798
2023-01-30 99.649 2023-01-26 100.481
2023-01-31 99.822 2023-01-27 100.708
2023-02-01 99.885 2023-01-30 100.57
2023-02-02 101.121 2023-01-31 100.773
2023-02-03 100.854 2023-02-01 100.999
2023-02-06 100.5 2023-02-02 102.037
2023-02-07 100.272 2023-02-03 102.104
2023-02-08 100.372 2023-02-06 101.85
2023-02-09 100.659 2023-02-07 101.765
2023-02-10 100.421 2023-02-08 101.806
2023-02-13 100.418 2023-02-09 101.905
2023-02-14 100.202 2023-02-10 101.675
2023-02-15 99.913 2023-02-13 101.491
2023-02-16 99.832 2023-02-14 101.304
2023-02-17 99.911 2023-02-15 101.242
2023-02-20 99.791 2023-02-16 101.621
2023-02-21 99.451 2023-02-17 101.581
2023-02-22 99.467 2023-02-20 101.545
2023-02-23 99.642 2023-02-21 101.334
2023-02-24 99.278 2023-02-22 101.246
2023-02-27 99.114 2023-02-23 101.857
2023-02-28 98.784 2023-02-24 101.71
2023-03-01 98.486 2023-02-27 101.759
2023-03-02 98.396 2023-02-28 101.649
2023-03-03 98.467 2023-03-01 101.583
2023-03-06 98.276 2023-03-02 101.426
2023-03-07 98.495 2023-03-03 101.666
2023-03-08 98.572 2023-03-06 101.919
2023-03-09 98.747 2023-03-07 102.048
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA
NA NA NA NA"


library(tidyverse)
library(dplyr)

DF <- read.table(text = lines, header = TRUE)

Then, let me format the dates in proper format:

library(zoo)

# format dates to POSIXct format
DF$date1 <- as.POSIXct(DF$date1)
DF$date2 <- as.POSIXct(DF$date2)

One way is to create two different datasets (looking at your requirement):

df1 <- DF %>% select(date1, Var1) %>% na.omit() %>% set_names(c("Date", "Var"))
df2 <- DF %>% select(date2, Var2) %>% na.omit() %>% set_names(c("Date", "Var"))

The create the separate zoo objects out of these:

zoo1 <- zoo(df1$Var, order.by = df1$Date)
zoo2 <- zoo(df2$Var, order.by = df2$Date)

Or if you want to merge these variables, you could do:

# merge both the dataframes created above
mergedDf <- merge(df1, df2, by = "Date")

# create the zoo object
zooObject <- zoo(mergedDf$Var.x, order.by = mergedDf$Date)

Let me know if this helps.

Manoj Kumar
  • 5,273
  • 1
  • 26
  • 33
0

In the question NA is always at the beginning so using Lines from the Note at the end define N as a comment character.

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, comment.chaqr = "N")

Note

Lines <- "date1 Var1 date2 Var2
2023-01-13 100.325 2023-01-11 99.748
2023-01-16 100.378 2023-01-12 99.832
2023-01-17 100.826 2023-01-13 99.878
2023-01-18 100.933 2023-01-16 99.762
2023-01-19 100.641 2023-01-17 99.484
2023-01-20 100.148 2023-01-18 99.743
2023-01-23 99.972 2023-01-19 99.419
2023-01-24 100.256 2023-01-20 99.364
2023-01-25 100.348 2023-01-23 99.533
2023-01-26 100.146 2023-01-24 99.711
2023-01-27 100.063 2023-01-25 99.798
2023-01-30 99.649 2023-01-26 100.481
2023-01-31 99.822 2023-01-27 100.708
2023-02-01 99.885 2023-01-30 100.57
2023-02-02 101.121 2023-01-31 100.773
2023-02-03 100.854 2023-02-01 100.999
2023-02-06 100.5 2023-02-02 102.037
2023-02-07 100.272 2023-02-03 102.104
2023-02-08 100.372 2023-02-06 101.85
2023-02-09 100.659 2023-02-07 101.765
2023-02-10 100.421 2023-02-08 101.806
2023-02-13 100.418 2023-02-09 101.905
2023-02-14 100.202 2023-02-10 101.675
2023-02-15 99.913 2023-02-13 101.491
2023-02-16 99.832 2023-02-14 101.304
2023-02-17 99.911 2023-02-15 101.242
2023-02-20 99.791 2023-02-16 101.621
2023-02-21 99.451 2023-02-17 101.581
2023-02-22 99.467 2023-02-20 101.545
2023-02-23 99.642 2023-02-21 101.334
2023-02-24 99.278 2023-02-22 101.246
2023-02-27 99.114 2023-02-23 101.857
2023-02-28 98.784 2023-02-24 101.71
2023-03-01 98.486 2023-02-27 101.759
2023-03-02 98.396 2023-02-28 101.649
2023-03-03 98.467 2023-03-01 101.583
2023-03-06 98.276 2023-03-02 101.426
2023-03-07 98.495 2023-03-03 101.666
2023-03-08 98.572 2023-03-06 101.919
2023-03-09 98.747 2023-03-07 102.048
2023-03-10 99.489 2023-03-08 101.915
NA NA 2023-03-09 101.927
NA NA 2023-03-10 101.775
NA NA NA NA
NA NA NA NA
NA NA NA NA"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341