0

I'm trying to write a txt to postgres bulk importer. The code currently crashes as the string which should get inserted to postgres isn't a valid UTF8: pq: invalid byte sequence for encoding UTF8: 0x00

In my code I'm checking if the strings are a valid UTF8 or not.

What am I missing?

Code:

for {
        line, more := <-lineChannel

        splitLine := strings.SplitN(line, ":", 2)

        if len(splitLine) == 2 {
            if utf8.Valid([]byte(splitLine[0])) && utf8.Valid([]byte(splitLine[1])) {
                lineCount++
                _, err = stmt.Exec(splitLine[0], splitLine[1])

                if lineCount%int64(copySize) == 0 {

                    _, err = stmt.Exec()
                    if err != nil {
                        log.Fatal("Failed at stmt.Exec", err)
                    }

                    err = stmt.Close()
                    if err != nil {
                        log.Fatal("Failed at stmt.Close", err)
                    }

                    err = txn.Commit()
                    if err != nil {
                        log.Fatal("failed at txn.Commit", err)
                    }

                    txn, err = db.Begin()
                    if err != nil {
                        log.Fatal("failed at db.Begin", err)
                    }

                    stmt, err = txn.Prepare(pq.CopyIn("pwned", "username", "password"))
                    if err != nil {
                        log.Fatal("failed at txn.Prepare", err)
                    }

                    if lineCount%(int64(copySize)*10) == 0 {
                        log.Printf("Inserted %v lines", lineCount)
                    }
                }

                if err != nil {
                    log.Println("error:", splitLine[0], splitLine[1])
                    log.Fatal(err)
                }
            }

EDIT: Line which makes the error:

Byte[]: [116 109 97 105 108 46 99 111 109 58 104 117 115 104 112 117 112 112 105 101 115 108 111 118 101]

line: username@hotmail.whatever:hushpuppieslove

splitLine[0] + splitLine[1]: username@hotmail.whatever hushpuppieslove

In0cenT
  • 481
  • 2
  • 11
  • 25

1 Answers1

5

0x00 is the null character and postgres does not allow this in strings. From the docs:

The NULL (0) character is not allowed because text data types cannot store such bytes.

You'll need to strip out the null characters.

Jeremy
  • 6,313
  • 17
  • 20
  • Thanks for your comment, I've read that the NULL character is not allowed. It doesn't seem to be the source of my issue too. Please check my edit with the error source part. – In0cenT May 31 '19 at 12:50
  • 4
    Did you actually try removing 0x00 from your input ? e.g: `strings.Replace(line, "\u0000", "", -1)` – Sylvain May 31 '19 at 22:19
  • Looks like that has worked, could you explain why `\u0000`wasn't visible in the error message but still caused the problem? – In0cenT Jun 01 '19 at 09:59