1

I need to write a UTF-16 conded csv and I can't figure out how to. I found a lot of questions and answers about reading UTF-16 csvs but nothing about writing.

This is what I've tried so far:

package main

import (
    "encoding/csv"
    "fmt"
    "os"

    "golang.org/x/text/encoding/unicode"

)

func main() {
    csvFile, err := os.Create("test.csv")
    if err != nil {
        panic(err)
    }
    defer csvFile.Close()

    message := "weird characters: дгодг"

    message, err = convertUtf8ToUtf16LE(message)
    if err != nil {
        panic(err)
    }
    fmt.Println(message)

    csvWriter := csv.NewWriter(csvFile)
    defer csvWriter.Flush()

    csvWriter.Write([]string{message})
    csvWriter.Write([]string{message})
}

func convertUtf8ToUtf16LE(message string) (string, error) {
    encoder := unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewEncoder()
    return encoder.String(message)
}

But I get the following output in the csv:

weird characters: дгодг*矾攀椀爀搀 挀栀愀爀愀挀琀攀爀猀㨀 㐀㌄㸄㐄㌄਄

What am I doing wrong?

None
  • 227
  • 1
  • 11
  • Why do you want this? Do you want to import the CSV into Excel? Excel can work with UTF8 files just fine – Panagiotis Kanavos Aug 26 '22 at 13:40
  • @PanagiotisKanavos I'm trying to bulk insert into SQL Server for linux which doesn't support CODEPAGE – None Aug 26 '22 at 13:42
  • SQL Server has no such problem. The *database driver* will map strings correctly from Go's UTF8 to SQL Server's `nvarchar` fields. – Panagiotis Kanavos Aug 26 '22 at 13:44
  • How are you trying to insert the data into the database? – Panagiotis Kanavos Aug 26 '22 at 13:45
  • @PanagiotisKanavos Reference: https://github.com/Microsoft/mssql-docker/issues/289#issuecomment-814624524 – None Aug 26 '22 at 13:45
  • @PanagiotisKanavos the data is in a massive SQLite db file. I'm trying to split the data into csvs to bulk insert. I've tryied this with python (easy to output UTF16 csv) and it worked great. – None Aug 26 '22 at 13:49
  • @PanagiotisKanavos the bulk insert is on the sql side, this question is about generating utf16 csvs with golang, which is what I'm trying to do. The go program is NOT inserting the data and it doesn't have to. – None Aug 26 '22 at 13:49
  • Assume you're talking to someone that worked with SQL Server for 20 years on a non-Latin locale. Again, how are you inserting the data? Using your own code? `bcp`? `BULK INSERT` ? Using what commands? Which SQL Server version? *Supported* versions work with UTF8. The codepage for UTF8 is 65001. – Panagiotis Kanavos Aug 26 '22 at 14:00
  • On the other hand, if you have an ODBC driver for SQLite you can [add the SQLite file as a Linked Server](https://www.mssqltips.com/sqlservertip/3087/creating-a-sql-server-linked-server-to-sqlite-to-import-data/) and copy the data. – Panagiotis Kanavos Aug 26 '22 at 14:02
  • @PanagiotisKanavos hahahah working for 20 years in a project doesn't really mean anything... it could just be that no one wanted to hire you somewhere else. Anyway... I'm using BULK INSERT on the 2019-latest image of MSSQL. I think adding the SQLite as a linked server might be a better idea, so I will try. – None Aug 26 '22 at 14:07

2 Answers2

2

I'm new to transforming, but I think this does the same thing you wrote for yourself, and it uses the same text package.

From your question, it looks like you were just missing the transformer.NewWriter() component.

I took this answer, and just turned it around to make it encode UTF16LE (oh. my. stars. Go makes these things so easy):

package main

import (
    "encoding/csv"
    "os"

    "golang.org/x/text/encoding/unicode"
    "golang.org/x/text/transform"
)

func main() {
    records := [][]string{
        {"first_name", "last_name", "username"},
        {"Rob", "Pike", "rob"},
        {"Ken", "Thompson", "ken"},
        {"Robert", "Griesemer", "gri"},
    }

    f, _ := os.Create("utf16le.csv")
    t := transform.NewWriter(f, unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewEncoder())
    w := csv.NewWriter(t)

    for _, record := range records {
        w.Write(record)
    }

    w.Flush()
}

And viewing that CSV file:

hexdump -C utf16le.csv
00000000  ff fe 66 00 69 00 72 00  73 00 74 00 5f 00 6e 00  |��f.i.r.s.t._.n.|
00000010  61 00 6d 00 65 00 2c 00  6c 00 61 00 73 00 74 00  |a.m.e.,.l.a.s.t.|
00000020  5f 00 6e 00 61 00 6d 00  65 00 2c 00 75 00 73 00  |_.n.a.m.e.,.u.s.|
00000030  65 00 72 00 6e 00 61 00  6d 00 65 00 0a 00 52 00  |e.r.n.a.m.e...R.|
00000040  6f 00 62 00 2c 00 50 00  69 00 6b 00 65 00 2c 00  |o.b.,.P.i.k.e.,.|
00000050  72 00 6f 00 62 00 0a 00  4b 00 65 00 6e 00 2c 00  |r.o.b...K.e.n.,.|
00000060  54 00 68 00 6f 00 6d 00  70 00 73 00 6f 00 6e 00  |T.h.o.m.p.s.o.n.|
00000070  2c 00 6b 00 65 00 6e 00  0a 00 52 00 6f 00 62 00  |,.k.e.n...R.o.b.|
00000080  65 00 72 00 74 00 2c 00  47 00 72 00 69 00 65 00  |e.r.t.,.G.r.i.e.|
00000090  73 00 65 00 6d 00 65 00  72 00 2c 00 67 00 72 00  |s.e.m.e.r.,.g.r.|
000000a0  69 00 0a 00                                       |i...|
000000a4
Zach Young
  • 10,137
  • 4
  • 32
  • 53
1

What I ended up doing is I created a struct that implements io.Writer for a file but converts the input to UTF-16LE before writing:

type UTF16LEWriter struct {
    file    *os.File
    encoder *encoding.Encoder
}

func NewUTF16LEWriter(file *os.File) (*UTF16LEWriter, error) {
    _, err := file.Write([]byte{0xFF, 0xFE}) // UTF-16LE BOM
    if err != nil {
        return &UTF16LEWriter{}, err
    }

    return &UTF16LEWriter{
        file:    file,
        encoder: unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewEncoder(),
    }, nil
}

func (w *UTF16LEWriter) Write(b []byte) (int, error) {
    b, err := w.encoder.Bytes(b)
    if err != nil {
        return 0, err
    }
    w.file.Write(b)
    return len(b), err
}

Then I only need to replace the io.Writer provided by os.Create with mine and pass that to the csvWriter:

package main

import (
    "encoding/csv"
    "fmt"
    "os"

    "golang.org/x/text/encoding/unicode"

)

func main() {
    csvFile, err := os.Create("test.csv")
    if err != nil {
        panic(err)
    }
    defer csvFile.Close()

    utf16Writer, err := NewUTF16LEWriter(csvFile)
    if err != nil {
        panic(err)
    }

    csvWriter := csv.NewWriter(utf16Writer)
    defer csvWriter.Flush()

    message := "weird characters: дгодг"
    csvWriter.Write([]string{message})
    csvWriter.Write([]string{message})
}
None
  • 227
  • 1
  • 11