Given text in shift-jis encoding, how can I decode it into Elixir's native UTF-8 encoding, and vice-versa?
Asked
Active
Viewed 351 times
1 Answers
4
The Codepagex library supports this. You just need to figure out what it calls SHIFT_JIS.
Codepagex uses the mappings available from unicode.org. There is one for shift-jis but it's marked as OBSOLETE, so is not available in Codepagex. However, Microsoft's CP932 is also available, which is effectively SHIFT_JIS, so you can use that.
Config
It's not enabled by default, so you need to enable in in config (and re-compile with mix deps.compile codepagex --force
if necessary):
config :codepagex, :encodings, [
"VENDORS/MICSFT/WINDOWS/CP932"
]
Encode/Decode
iex(1)> shift_jis = "VENDORS/MICSFT/WINDOWS/CP932"
"VENDORS/MICSFT/WINDOWS/CP932"
iex(2)> test = Codepagex.from_string!("テスト", shift_jis)
<<131, 101, 131, 88, 131, 103>>
iex(3)> Codepagex.to_string!(test, shift_jis)
"テスト"
Example repo
I made an example repo where you can see it in action.

Adam Millerchip
- 20,844
- 5
- 51
- 74
-
Is this only for Windows? or does running on Mac or Linux require a different module in the `shift_jis` variable? – DogEatDog Jan 03 '22 at 03:38
-
1@DogEatDog The `Codepagex` library is pure Elixir and platform-independent, AFAIK. `VENDORS/MICSFT/WINDOWS/CP932` is not a module, but the path to mapping file it downloads from unicode.org during compilation. – Adam Millerchip Jan 03 '22 at 04:17
-
1Small correction, the mapping files are are not downloaded during compilation, they are included in the distribution: https://github.com/tallakt/codepagex/tree/master/unicode – Adam Millerchip Jan 03 '22 at 14:39
-
this is good to know. thank you – DogEatDog Jan 03 '22 at 16:26