2

Given text in shift-jis encoding, how can I decode it into Elixir's native UTF-8 encoding, and vice-versa?

Adam Millerchip
  • 20,844
  • 5
  • 51
  • 74

1 Answers1

4

The Codepagex library supports this. You just need to figure out what it calls SHIFT_JIS.

Codepagex uses the mappings available from unicode.org. There is one for shift-jis but it's marked as OBSOLETE, so is not available in Codepagex. However, Microsoft's CP932 is also available, which is effectively SHIFT_JIS, so you can use that.

Config

It's not enabled by default, so you need to enable in in config (and re-compile with mix deps.compile codepagex --force if necessary):

config :codepagex, :encodings, [
  "VENDORS/MICSFT/WINDOWS/CP932"
]

Encode/Decode

iex(1)> shift_jis = "VENDORS/MICSFT/WINDOWS/CP932"
"VENDORS/MICSFT/WINDOWS/CP932"
iex(2)> test = Codepagex.from_string!("テスト", shift_jis)
<<131, 101, 131, 88, 131, 103>>
iex(3)> Codepagex.to_string!(test, shift_jis)
"テスト"

Example repo

I made an example repo where you can see it in action.

Adam Millerchip
  • 20,844
  • 5
  • 51
  • 74
  • Is this only for Windows? or does running on Mac or Linux require a different module in the `shift_jis` variable? – DogEatDog Jan 03 '22 at 03:38
  • 1
    @DogEatDog The `Codepagex` library is pure Elixir and platform-independent, AFAIK. `VENDORS/MICSFT/WINDOWS/CP932` is not a module, but the path to mapping file it downloads from unicode.org during compilation. – Adam Millerchip Jan 03 '22 at 04:17
  • 1
    Small correction, the mapping files are are not downloaded during compilation, they are included in the distribution: https://github.com/tallakt/codepagex/tree/master/unicode – Adam Millerchip Jan 03 '22 at 14:39
  • this is good to know. thank you – DogEatDog Jan 03 '22 at 16:26