4

I am working on time series data, for which the key column is a timestamp : Time. There are also many "value" columns for each row.

I am about to shift a whole range of my data by several hours (due to a daylight saving time issue). For that, I will update the key of several rows, and it might result in some duplicate keys. I would like the duplicate keys on the edge of the date range to be ignore. I want the shifted range to override the old one.

I plan to do something like :

UPDATE IGNORE time_series_table 
SET time=time-<some_shift> 
WHERE <time in a date-range>

Here is the output of describe <table> for the time key :

Field     Type      Null Key     Default Extra
TimeMeas  datetime  NO   PRI     NULL

My question is : Will it shift all the keys at once, or will it try to shift each row one by one, resulting in massive duplicate keys wihthin the shifted range itself ?

Do you have a better way of doing this in mind ? Thanks in advance

Raphael Jolivet
  • 3,940
  • 5
  • 36
  • 56
  • When you say timestamp you mean the TIMESTAMP field type or an actual integer? Secondly, is this a UNIQUE key? The output of DESCRIBE or SHOW CREATE TABLE will be helpful. – georgepsarakis Sep 29 '11 at 18:47
  • I have added the output of the describe in the question. Anyway. I got a good anwser yesterday : It was about using a temporary table to duplicate this one and using "REPLACE" / "ignore". I was about to vote for it, but it is gone today wihtout notification. This might be a bug of SO or the author deleting it for some reason. – Raphael Jolivet Sep 30 '11 at 08:40
  • What engine is used by the table? Is it InnoDB? – Romain Sep 30 '11 at 08:43

1 Answers1

3

Will it shift all the keys at once, or will it try to shift each row one by one

It will shift all the keys at once.

resulting in massive duplicate keys wihthin the shifted range itself ?

It just failed if any of primary key is duplicated.
With update ignore, it just skip silently.

This is my approach to fix this

/* create a temporary table to store matches records*/
create table tmp_table select time-<some_shift>, etc_cols....
from time_series_table 
where <time in a date-range>;

then

/* delete the matches in the original table */
delete from time_series_table where <time in a date-range>;
delete from time_series_table where <time in a date-range - some_shift>;

finally

/* at this point, there won't be any duplicate data */
/* so, insert back into original table */
insert into time_series_table select * from tmp_table;
optmize table time_series_table;
ajreal
  • 46,720
  • 11
  • 89
  • 119