0

so I'm trying to import a huge .sql file into my DB with Ansible. I have configured MySQL to run set global default-character-set = utf8mb4 before the DB is created. I've confirmed the DB charset is set to utf8mb4 before importing the data below into it.

Yet i still receive 58229: Incorrect string value: '\\xBA' for column 'ISBN' at row 1\n"}.

The line in question looks like INSERT INTOBX-Book-RatingsVALUES (11676,'8475560806º',6);

º seems to be the issue here. Below is my tbl structure:

CREATE TABLE `BX-Book-Ratings` (
  `User-ID` int(11) NOT NULL default '0',
  `ISBN` varchar(13) NOT NULL default '',
  `Book-Rating` int(11) NOT NULL default '0'
) ENGINE=MyISAM;

Any tips would be much appreciated. Thanks.

tadman
  • 208,517
  • 23
  • 234
  • 262
b0uncyfr0
  • 177
  • 2
  • 2
  • 10
  • First, don't use MyISAM, use InnoDB as a default engine. Secondly, why are you inserting wonky values like º in your ISBN field? Looks like you have really dirty data, plus if that's coming through as `\xBA` it's not UTF-8. Characters like that would be two bytes if so. Check your source and ensure that's actually UTF-8. Character 0xBA is the degree symbol in [Windows-1252 encoding](https://en.wikipedia.org/wiki/Windows-1252). – tadman Oct 29 '18 at 21:59
  • Yes the data is a mix of several enconding but i still have to import it somehow - perhaps remove all non utf8 characters? – b0uncyfr0 Oct 30 '18 at 05:15
  • Stripping non-UTF-8 characters is a good plan, but is not always easily done. – tadman Oct 30 '18 at 18:25

1 Answers1

0

Assuming you are expecting º (MASCULINE ORDINAL INDICATOR), I will suggest...

Hex BA, in latin1 is that character. For utf8, it is hex C2BA.

I assume you don't want to convert your 'huge' .sql file. Also, I assume there is not a mixture of encodings; this would be a disaster.

You have not shown the code that you are using to do the "load". If it is LOAD DATA, then add this clause: CHARACTER SET latin1 in the syntactically correct place.

To discuss further, provide these:

SHOW VARIABLES LIKE 'char%';
SHOW CREATE TABLE `BX-Book-Ratings`;

I strongly recommend using InnoDB, not MyISAM.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Im not using load - im simply running insert into statements. – b0uncyfr0 Oct 30 '18 at 05:15
  • | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | – b0uncyfr0 Oct 30 '18 at 05:17
  • | BX-Book-Ratings | CREATE TABLE `BX-Book-Ratings` ( `User-ID` int(11) NOT NULL DEFAULT '0', `ISBN` varchar(13) NOT NULL DEFAULT '', `Book-Rating` int(11) NOT NULL DEFAULT '0' ) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 – b0uncyfr0 Oct 30 '18 at 05:19
  • It seems i have different encoding in the file itself as tadman mentioned. Perhaps it would be easier to remove all characters that are not utf8 encoded? – b0uncyfr0 Oct 30 '18 at 05:22
  • @b0uncyfr0 - If some are utf8 but some are latin1, the only solution is to remove the offending characters. If all are garbage (this particular example seems like such), then cleansing the input is probably the right answer. Else (all are latin1-encoded, and reasonable values), `SET NAMES latin1`. – Rick James Oct 30 '18 at 17:33