0

Here is the sample string that I am using encoded in UCS-2:

abvgdđežzijklmnjoprstćuvhcčdžš1234567890*+;'

When converting UCS-2 to iso ISO-8859-1//TRANSLIT with iconv binary from file to file I get:

abvgd?ezzijklmnjoprstcuvhccdzs1234567890*+;'

Now I want to use libiconv in go project. I am using this library github.com/qiniu/iconv as bindings for libiconv. But when using bindings I get:

abvgd?e?zijklmnjoprst?uvhc?d??1234567890*+;'

It's like different transliteration rules apply when using library inside go.

I examined go bindings library and everything seems in order; only bytes are passed around so no "loss of information" could happen there.

Is there anything else that I should be aware of when using libiconv? Is there some environment context that could trigger different transliteration behaviour?


EDIT (additional explanation about invocation):

I have two files "ucs-2.txt" and "latin1.txt". ucs-2.txt file contains UCS-2 encoded string and latin1.txt contains string got by running:

iconv -f UCS2 -t ISO-8859-1//TRANSLIT --verbose data/encoding/ucs-2.txt > data/encoding/latin1.txt

In go I use these lines to pull content from these files:

var err error
ucs2, err = ioutil.ReadFile("data/encoding/ucs-2.txt")
if err != nil {
    log.Fatal(err)
}
latin1, err = ioutil.ReadFile("data/encoding/latin1.txt")
if err != nil {
    log.Fatal(err)
}

This function is doing conversion:

func convertEnc(content []byte) ([]byte, error) {
    cd, err := iconv.Open("ISO-8859-1//TRANSLIT", "UCS2")
    if err != nil {
        return nil, err
    }
    defer cd.Close()
    var outbuf [255]byte
    res, _, err := cd.Conv(content, outbuf[:])
    log.Printf("result: %+q", res)
    return res, err
}

And I am using DeepEqual for testing:

reflect.DeepEqual(res, latin1)
Aleksandar Janković
  • 821
  • 1
  • 10
  • 18

2 Answers2

2

The first output includes transliteration, i.e. certain characters (e.g. ž) are transliterated into their not-quite-right "plain" counterpart (z) in order to be representable in an encoding that does not support the original character (here, ž in Latin-1).

The second output did not transliterate anything, it dropped any characters not representable in the target encoding (ž, ć, ... in Latin-1).

Thus, I suspect you can the binary with different options than the library. Not familiar with libiconv, it seems that the //TRANSLIT part was omitted or is not supported by the function you used...?

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • Thank you I understand how transliteration works. "//TRANSLIT" is the parameter that will trigger transliteration. Without it first unknown character will end conversion. Because of that I assume that some transliteration is triggered with library use but with different mapping than with binary. – Aleksandar Janković Aug 04 '15 at 13:49
  • @AleksandarJanković: Those characters in your input string are not representable in Latin-1. The first run did transliterate them into "plain" characters, the second didn't. So you invoked different functionality. What exactly the difference was is hard to tell, because you didn't show the actual invocations in your question. (Hint, hint.) – DevSolar Aug 04 '15 at 13:54
1

Transliteration is locale dependent. May be your libiconv is lacking/has wrong locale. Or the locale you are using there has no transliteration configured.

Please check this bug report as it has a few examples and a discussion on this topic.

Diego
  • 812
  • 7
  • 25