1

I am having issues making updates with the data. So here is the thing: 1) I loaded into mysql dumps of data corresponding to daily climate data. I have 18 tables with daily data (one table per parameter). My goal is to merge them into one single table.

Table 1

station| dayOfYear| temp|

1|  20010101|   2|

2|  20010101|   3|

3|  20010101|   4|

1|  20010102|   3|

2|  20010102|   4|

3|  20010102|   4|

Table 2

station|    dayOfYear|  humidity

1|  20010101|   20|

2|  20010101|   35|

3|  20010101|   100|

1|  20010102|   14|

2|  20010102|   40|

3|  20010102|   80|

2)As you can see in the example (table1 and table2), all of those tables shared "station" and "dayOfYear" column. Therefore,

3) My initial plan was that once having every dump uploaded I would update in my "TargetTable" (I had previously created and it contains one column per parameter and the shared columns "Station" and "dayOfYear")with each one of the dumps, so at the end I would have one table with the data per day per paremeter. My plan did not work, I have more than 2 millions of rows, the query lasted hours running and at the end connection went lost. So I would get smth like this:

TargetTable

station|dayOfYear|temp|humidity

1| 20010101| 2| 20

2| 20010101| 3| 35

3| 20010101| 4| 100

1| 20010102| 3| 14

2| 20010102| 4| 40

3| 20010102| 4| 80

Since my plan (get the TargeTable with a join) did not work, I am wondering if its possible in mysql to add the columns to the "Targettable" since the begining, at the time of uploading the csv's.

Something like:

LOAD DATA LOCAL
INFILE 'C:/Daily/temp.csv'
INTO TABLE TargetTable
CHARACTER SET UTF8
FIELDS TERMINATED BY ',' 
optionally ENCLOSED BY '"' 
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 rows( stn, dayOfYear, temp, humidity);

 LOAD DATA LOCAL
INFILE 'C:/Daily/hum.csv'
INTO TABLE TargetTable
SET
  humidity= ...?? **what would come here??**

Do you know how to do it? Do you have any other suggestion of how to handle this problem. JFYI. I already modified the time for running queries.

John Perez
  • 117
  • 10

2 Answers2

1

Use the on duplicate mysql feature. Here you can implement it:

1-Preparetion

a] create a temporary table

CREATE TEMPORARY TABLE tmpTarget LIKE TargetTable;

b] alter TargetTable keys

ALTER TABLE TargetTable 
DROP PRIMARY KEY
ADD PRIMARY KEY (stn, dayOfYear);

2-Routines

a] insert temperature data into real target table (as you already did)

LOAD DATA LOCAL
INFILE 'C:/Daily/temp.csv'
INTO TABLE TargetTable
CHARACTER SET UTF8
FIELDS TERMINATED BY ',' 
optionally ENCLOSED BY '"' 
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 rows( stn, dayOfYear, temp, humidity);

b] insert humidity data into tmp table first

LOAD DATA LOCAL
INFILE 'C:/Daily/hum.csv'
INTO TABLE tmpTarget
CHARACTER SET UTF8
FIELDS TERMINATED BY ',' 
optionally ENCLOSED BY '"' 
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 rows( stn, dayOfYear, temp, humidity) set @temp=null;

c] update or insert data to the real target table from the tmp table

INSERT INTO TargetTable
SELECT * FROM tmpTarget
ON DUPLICATE KEY UPDATE humidity = VALUES(humidity);

d] Truncate the tmp table

TRUNCATE tmpTarget;
4givN
  • 2,936
  • 2
  • 22
  • 51
  • You are a GENIOUS!! Just as an addition, In the step 2b is necessary to add LOAD DATA LOCAL INFILE 'C:/Daily/hum.csv' INTO TABLE tmpTarget CHARACTER SET UTF8 FIELDS TERMINATED BY ',' optionally ENCLOSED BY '"' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 rows( stn, dayOfYear, temp, humidity) set @temp=null; otherwise is putting the humidity in the temp column, Thanks – John Perez Feb 06 '18 at 16:33
  • @Monitotier: Good to help ;) I re-edited [2b] according your testing. A little `up-vote` or `accepting my answer` will be very welcome if you were satisfied with it. – 4givN Feb 06 '18 at 17:29
  • 1
    I don't have power to give you a vote!! so sad :( As long as I get enough power I will do it :) ... I accepted your perfect answer – John Perez Feb 06 '18 at 20:10
0

I see another post who asked like you. The method to improve speed is Insert column row by row with some custom script. On another hand. FWIW the following steps caused a huge speedup of LOAD DATA INFILE:

SET FOREIGN_KEY_CHECKS = 0;
SET UNIQUE_CHECKS = 0;
SET SESSION tx_isolation='READ-UNCOMMITTED';
SET sql_log_bin = 0;
#LOAD DATA LOCAL INFILE....
SET UNIQUE_CHECKS = 1;
SET FOREIGN_KEY_CHECKS = 1;
SET SESSION tx_isolation='READ-REPEATABLE';

More info at Best practices for importing large CSV files

sycoi001
  • 74
  • 1
  • 8
  • 1
    thanks! I will go through this you have posted. I was googling a lot, but since I did not how to elaborate the question, I was not even closed to get what you have suggested. Thanks, I will go check – John Perez Feb 06 '18 at 10:22
  • I have checked what you kindly suggested, but those lines help to improve the loading. My problem was not loading. My problem was performing the join between tables. That's why I was asking if there was a way in the loading: "to create a loading query where the table I want to add, matches with the pre-loaded table in two of its columns _(station and Day of the year)_, then add the third/rest of the cvs table to the pre-load(targetTable). _Seems that I did not explain it correctly._ – John Perez Feb 06 '18 at 11:55