1

I have a SQL dump via DBeaver that ends up looking like this, but with 200k+ rows in 10 row segments:

INSERT INTO addresses (address_uuid,parent_address_uuid,`address.building`,`address.house_num`,`address.predir`,`address.qual`,`address.pretype`,`address.name`,`address.suftype`,`address.sufdir`,`address.ruralroute`,`address.extra`,`address.city`,`address.state`,`address.country`,`address.postcode`,`address.box`,`address.unit`,location,tenancy,units,created_at,updated_at) VALUES (...)

I'm attempting to utilize this file in a seeder and having all kinds of string formatting issues, but I've narrowed it down to only having a small diamond with a question mark: � (thought it might be a utf-8 encoding isssue, but that is seeming less and less likely) before INSERT INTO addresses that is throwing a PDOException::("SQLSTATE[42601]: Syntax error: 7 ERROR: syntax error at or near "INSERT". Here's my code thus far:

// Get the file and remove line breaks and back ticks...
$sql = preg_replace("/\r|\n|`/", "", file_get_contents('some-file.sql'));

// Create array of INSERTS with 10 rows at a time...
$statements = array_filter(array_map('trim', explode(';', $sql)));

// Iterate and execute per segment
foreach ($statements as $stmt) {
    \DB::statement($stmt);
}

As you can see I'm already removing line breaks and back ticks, but not sure what is the best way to remove the �. If I call utf8_encode() it adds some other garbage before the INSERT: 

I'm very open to "better" solutions to this whole process i.e., flysystem/filesystem, fopen, etc. The resources on it are sparse and mixed, despite it seeming to be a fairly straightforward and common need.

Also, using postgres and have the charset to utf8 in my config:

'pgsql' => [
    'driver' => 'pgsql',
    'host' => env('DB_HOST', '127.0.0.1'),
    'port' => env('DB_PORT', '5432'),
    'database' => env('DB_DATABASE', 'forge'),
    'username' => env('DB_USERNAME', 'forge'),
    'password' => env('DB_PASSWORD', ''),
    'charset' => 'utf8',
    'prefix' => '',
    'prefix_indexes' => true,
    'schema' => 'public',
    'sslmode' => 'prefer',
],

And SHOW client_encoding; verifies that is is UTF8. The encoding of my file is "UTF-8" when I dump $enc = mb_detect_encoding($file, mb_list_encodings(), true);. Not sure if this is relevant, but \Illuminate\Support\Facades\Input::get('encoding'); dumps null;

Oh and DBeaver's bloody Encoding is set to UTF-8 as well:

enter image description here

Matt Larsuma
  • 1,456
  • 4
  • 20
  • 52

1 Answers1

0

Ended up being that Insert BOM checkbox. Aka Byte Order Mark. Didn't think to uncheck that until I pasted that screenshot

Matt Larsuma
  • 1,456
  • 4
  • 20
  • 52