R/zoo: index entries in ‘order.by’ are not unique

Question

I have a .csv file containing 4 columns of data against a column of dates/times at one-minute intervals. Some timestamps are missing, so I'm trying to generate the missing dates/times and assign them NA values in the Y columns. I have previously done this with other .csv files with exactly the same formatting, with no issues. The code is:

# read the csv file
har10 = read.csv(fpath, header=TRUE);

# set date
har10$HAR.TS<-as.POSIXct(har10$HAR.TS,format="%y/%m/%d %H:%M")

# convert to zoo
df1.zoo<-zoo(har10[,-1],har10[,1]) #set date to Index

# merge and generate NAs
df2 <- merge(df1.zoo,zoo(,seq(start(df1.zoo),end(df1.zoo),by="min")), all=TRUE)

# write zoo object to .csv file in Home directory
write.zoo(df2, file = "har10fixed.csv", sep = ",")

My data looks like this (for an entire year, more or less) after conversion to POSIXct, which seems to go fine:

                    HAR.TS        C1       C2         C3        C4
1      2010-01-01 00:00:00 -4390.659 5042.423 -2241.6344 -2368.762
2      2010-01-01 00:01:00 -4391.711 5042.056 -2241.1796 -2366.725
3      2010-01-01 00:02:00 -4390.354 5043.003 -2242.5493 -2368.786
4      2010-01-01 00:03:00 -4390.337 5038.570 -2242.7653 -2371.289

When I the "convert to zoo" step I get the following error:

 Warning message:
 In zoo(har10[, -1], har10[, 1]) :
   some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique

I have checked for duplicated entries but get no results:

> anyDuplicated(har10)
[1] 0

Any ideas? I have no idea why I'm getting this error on this file, but it has worked for previous ones. Thanks!

EDIT: Reproducable form:

EDIT 2: Have to remove the data/code, sorry!

I tried setting it to GMT but no dice. The data was recorded in JST (Japan Standard Time) so there is no DST. Also I have successfully run the same code on three other .csv files, with the exact same formatting, including the same date/time range in one case. That's what's throwing me - I don't know why this should be different. — James A, Jul 03 '13 at 12:31
I've edited my original post to include my file (Dropbox link) and full code. I'll have to remove it after solving the problem, though. — James A, Jul 03 '13 at 12:43
Try it with the first half of the data and the xecond half of the data separately and keep doing that until you can reproduce the problem with just a few rows of data. Then post just that. — G. Grothendieck, Jul 03 '13 at 12:52
I have over 400,000 rows of data. That may take some time... — James A, Jul 03 '13 at 12:54
Ok, it gets stranger. I've done January to April so far, with each month in a separate file, with no errors. Yet when I try to run January to March in one file, I get the error. — James A, Jul 03 '13 at 13:17
The answer by @JoshuaUlrich is probably what you are looking for. Just one short annotation: have a look at `read.zoo`. You might save some lines of code by using it. It would look like this: `df1.zoo <- read.zoo(fpath, index.column = 1, FUN = function(x) as.POSIXct(strptime(x, format = "%y/%m/%d %H:%M")), sep = ",", header = TRUE)` — fdetsch, Jul 03 '13 at 14:01

score 11 · Accepted Answer · answered Jul 03 '13 at 13:25

11

anyDuplicated(har10) tells you if any complete rows are duplicated. zoo is warning about the index, so you should run anyDuplicated(har10$HAR.TS). sum(duplicated(har10$HAR.TS)) will show there are almost 9,000 duplicate datetimes. The first duplicate is around row 311811, where 10/08/19 13:10 appears twice.

answered Jul 03 '13 at 13:25

Joshua Ulrich

173,410
32
338
418

Thank you! I'm new to R still but I can't believe it was this simple. I'm packing up for the day now but will check this tomorrow - I think you may be right. – James A Jul 03 '13 at 13:34
Yeah, this works. Once again my data is riddled with errors and duplicates - fantastic! :( Thank you! – James A Jul 04 '13 at 06:20

score 2 · Answer 2 · edited Dec 06 '15 at 10:43

2

And to handle duplicated indices (see ?zoo and ?aggregate.zoo)

## zoo series with duplicated indexes
z3 <- zoo(1:8, c(1, 2, 2, 2, 3, 4, 5, 5))
plot(z3)

## remove duplicated indexes by averaging
lines(aggregate(z3, index, mean), col = 2, type = "o")

## or by using the last observation
lines(aggregate(z3, index, tail, 1), col = 4)

edited Dec 06 '15 at 10:43

G. Grothendieck

254,981
17
203
341

answered Dec 04 '15 at 16:23

Amos Folarin

2,059
20
18

Good to know. Very good example. However, it's not the answer to the question. – johnatasjmo Apr 18 '20 at 04:33

R/zoo: index entries in ‘order.by’ are not unique

2 Answers2

Linked