I have a MySQL dump, which is not valid UTF-8. Two questions:
- Could this be caused by some of the database using utf8mb3 aka MySQL's 'utf8'? It definitely does use this encoding.
- If so, how can I fix it, without having access to MySQL to import, change table types, and re-export? Can I use any encoding conversion tool?
Edit to add specific data which isn't valid UTF-8:
uconv -f utf8 a.sql -o /dev/null
Conversion to Unicode from codepage failed at input byte position XXX. Bytes: ed Error: Illegal character found
Here is a hex sample.
xxd -s {XXX-16} -l 30 a.sql
YYY: 6e2e 203c 2f70 3e20 cfa1 ecaf a6eb 9ea0 n. </p> ........
aabb aabb aabb aabb aaaa bbbb bbaa aaaa
YYZ: edb6 b0e1 aea5 ee9e a027 2c27 3230 .........','20
^^^^ ^^
Edit 2: Added more context above. Also looks like the problem sequence meets UTF-8 format, it just maps to U+1DDB0 which doesn't exist.