0

I'm pulling my hair out.

I have a csv full of customer data, and am getting a number of errors when trying to import it into MySQL using Sequel Pro's Import function.

==========

I'm importing with all the default options, encoding as latin1.

The table structure is:

CREATE TABLE `sp_personal_info13` (
  `title` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `hours` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `description` varchar(8192) CHARACTER SET utf8 DEFAULT NULL,
  `services` varchar(1024) CHARACTER SET utf8 DEFAULT NULL,
  `in_business_since` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `rating` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `address1` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `address2` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `city` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `state` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `zipcode` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `latitude` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `longitude` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `phone` varchar(45) CHARACTER SET utf8 DEFAULT NULL,
  `fax` varchar(45) CHARACTER SET utf8 DEFAULT NULL,
  `first_name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `last_name` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `email_address` varchar(254) CHARACTER SET utf8 DEFAULT NULL,
  `website` varchar(254) CHARACTER SET utf8 DEFAULT NULL,
  `num_of_reviews` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `sp_categories` varchar(1024) CHARACTER SET utf8 DEFAULT NULL,
  `has_been_initially_contacted` varchar(16) CHARACTER SET utf8 DEFAULT NULL,
  `user_id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=33298 DEFAULT CHARSET=latin1;

==========

The csv has about 47,000 customer entries. This is an example couple of entries (I had to change the data since it is private customer data):

Example Customer One    Mon 08.30 AM - 05.00 PM, Tue 08.30 AM - 05.00 PM, Wed 08.30 AM - 05.00 PM, Thu 08.30 AM - 05.00 PM, Fri 08.30 AM - 05.00 PM, Sat Closed, Sun Closed This is just a really long description and in this case it was only one thousand characters long, way shorter than the 8192 this field will allow.  this, is a comma, separated, list of all, different things, about this customer 2006    A   123 Main St #4  VCA 123 Smallville  California  12345   30.12345    -81.889951  4154151515  4157891010  Jan Smith   myemail@gmail.com   www.website.com 67  another,comma,separated,list
Example Customer One    Mon 08.30 AM - 05.00 PM, Tue 08.30 AM - 05.00 PM, Wed 08.30 AM - 05.00 PM, Thu 08.30 AM - 05.00 PM, Fri 08.30 AM - 05.00 PM, Sat Closed, Sun Closed This is just a really long description and in this case it was only one thousand characters long, way shorter than the 8192 this field will allow.  this, is a comma, separated, list of all, different things, about this customer 2006    A   123 Main St #4  VCA 123 Smallville  California  12345   30.12345    -92.937672  4154151515  4157891010  Jan Smith   myemail@gmail.com   www.website.com 67  another,comma,separated,list

And this is the error I get when importing:

"An error occurred when reading the file, as it could not be read using the encoding you selected (Autodetect - Western (Windows Latin 1)).

Only 4900 rows were imported."

==========

So I decided to take just the first 3,000 lines of it, save it out as a new csv, and to try to debug that since it's smaller. But frustratingly, when I import that version, I get entirely different errors. :-/ Here's that error output:

[ERROR in row 13] Data too long for column 'in_business_since' at row 1
[ERROR in row 15] Data too long for column 'fax' at row 1
[ERROR in row 20] Data too long for column 'zipcode' at row 1
[ERROR in row 23] Data too long for column 'phone' at row 1
[ERROR in row 269] Data too long for column 'fax' at row 1
[ERROR in row 399] Data too long for column 'phone' at row 1
[ERROR in row 421] Data too long for column 'sp_categories' at row 1
[ERROR in row 846] Data too long for column 'phone' at row 1
[ERROR in row 999] Data too long for column 'sp_categories' at row 1
[ERROR in row 1557] Data too long for column 'sp_categories' at row 1
[ERROR in row 1869] Data too long for column 'zipcode' at row 1
[ERROR in row 1959] Data too long for column 'services' at row 1
[ERROR in row 2359] Data too long for column 'zipcode' at row 1
[ERROR in row 2622] Data too long for column 'sp_categories' at row 1
[ERROR in row 2842] Data too long for column 'sp_categories' at row 1

But I have checked and none of these are too long like the error msg says.

I used to code years ago (C, C++, Java) but am new to databases so maybe I'm simply being careless but have spent days on this and am out of ideas.

Zander F
  • 1
  • 1
  • 1
  • Are there double quotes in those rows? – Danny_ds Nov 09 '17 at 11:36
  • @Danny_ds, none show up if I am viewing the csv through Excel, but I see them if viewing through a hex editor. Here's an example: _,50,"Moving,Storage Facilities,Piano Moving" Allen_ – Zander F Nov 09 '17 at 15:57
  • If there is no comma before _Allen_ (i.e. belongs to the same field), that could be your problem, depending on how the parser treats those quotes. Otoh, I wonder what the delimiter is - TAB (as in your question) or comma? – Danny_ds Nov 09 '17 at 20:13
  • Danny, thanks so much - getting into the exact characters in the hex editor solved it. There were indeed some quotes and CRLFs that was creating the problem. – Zander F Nov 11 '17 at 18:13

0 Answers0