4

I'm using LOAD DATA INFILE to upload a .csv into a table.

This is the table I have created in my db:

CREATE TABLE expenses (entry_id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(entry_id), 
ss_id INT, user_id INT, cost FLOAT, context VARCHAR(100), date_created DATE);

This is some of the sample data I'm trying to upload (some of the rows have data for every column, some are missing the date column):

1,1,20,Sandwiches after hike,
1,1,45,Dinner at Yama,
1,2,40,Dinner at Murphys,
1,1,40.81,Dinner at Yama,
1,2,1294.76,Flight to Taiwan,1/17/2011
1,2,118.78,Grand Hyatt @ Seoul,1/22/2011
1,1,268.12,Seoul cash withdrawal,1/8/2011

Here is the LOAD DATA command which I can't get to work:

LOAD DATA INFILE '/tmp/expense_upload.csv'
INTO TABLE expenses (ss_id, user_id, cost, context, date)
;

This command completes, uploads the correct number of rows into the table but every field is NULL. Anytime I try to add FIELDS ENCLOSED BY ',' or LINES TERMINATED BY '\r\n' I get a syntax error.

Other things to note: the csv was created in MS Excel.

If anyone has tips or can point me in the right direction it would be much appreciated!

peterm
  • 91,357
  • 15
  • 148
  • 157
john k
  • 1,086
  • 4
  • 14
  • 19

2 Answers2

9

First of all I'd change FLOAT to DECIMAL for cost

CREATE TABLE expenses 
(
  entry_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, 
  ss_id INT, 
  user_id INT, 
  cost DECIMAL(19,2), -- use DECIMAL instead of FLOAT
  context VARCHAR(100), 
  date_created DATE
);

Now try this

LOAD DATA INFILE '/tmp/sampledata.csv' 
INTO TABLE expenses  
    FIELDS TERMINATED BY ',' 
           OPTIONALLY ENCLOSED BY '"'
    LINES  TERMINATED BY '\n' -- or \r\n
(ss_id, user_id, cost, context, @date_created)
SET date_created = IF(CHAR_LENGTH(TRIM(@date_created)) > 0, 
                      STR_TO_DATE(TRIM(@date_created), '%m/%d/%Y'), 
                      NULL);

What id does:

  1. it uses correct syntax for specifying fields and columns terminators
  2. since your date values in the file are not in a proper format, it first reads a value to a user/session variable then if it's not empty it converts it to a date, otherwise assigns NULL. The latter prevents you from getting zero dates 0000-00-00.
peterm
  • 91,357
  • 15
  • 148
  • 157
0

Here is my advice. Load the data into a staging table where all the columns are strings and then insert into the final table. This allows you to better check the results along the way:

CREATE TABLE expenses_staging (entry_id INT NOT NULL AUTO_INCREMENT,
                               PRIMARY KEY(entry_id), 
                               ss_id varchar(255),
                               user_id varchar(255),
                               cost varchar(255),
                               context VARCHAR(100),
                               date_created varchar(255)
                              );

LOAD DATA INFILE '/tmp/expense_upload.csv'
    INTO TABLE expenses_staging (ss_id, user_id, cost, context, date);

This will let you see what is really being loaded. You can then load this data into the final table, doing whatever data transformations are necessary.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • It's helpful to see it all, unfortunately about 1 in every 50 lines gets imported and the rest are NULL. Any idea why this would be? – john k Jul 16 '13 at 01:59