0

I've got a list of random hex values of 3 digits each:

 List<hex> hexes = "A19", "8EB", "5EF"

I'd like to compress them into a list of single characters that can be copied and pasted, then be decompressed later on. For asthetic reasons, it would be nice if they were all CJK characters.

 HexToCharacters(hexes) --> "寏雳䠰"
 CharactersToHex("寏雳䠰") --> "A19", "8EB", "5EF"

Which particular CJK characters this generates isn't important, as long as they can safely make the round trip from hex to CJK back to hex.

So far, I haven't found a way to generate these characters in such a way that they are guaranteed to be in the CJK range.

(I'm using C# in my own project, but the language isn't important -- I'm just looking for a method that works.)

Joe
  • 3,804
  • 7
  • 35
  • 55
  • I think that you have 64k (0xFFFF) of hexidecimal values, but there's not that many CJK characters (around 0xBFFF). And encoding most of those CJK characters takes ~4 bytes depending on the encoding, so this isn't a compression at all, it's the same amount of bytes as the raw data. – Mooing Duck Mar 12 '15 at 19:14
  • @MooingDuck, good point -- I'd be fine with using 3 hex digits instead of 4. – Joe Mar 12 '15 at 19:16
  • Oh, in that case simply map them to U+5000 through U+5FFF, which take two bytes each in UTF16. Pretty much all CJK are 3 bytes in UTF8 though, so it's still not fair to call it a "compression". http://en.wikibooks.org/wiki/Unicode/Character_reference/5000-5FFF – Mooing Duck Mar 12 '15 at 19:17
  • @MooingDuck, true, this isn't actually compressing the number of bytes needed, but it is reducing the number of characters in a string representation. – Joe Mar 12 '15 at 19:19
  • If you're desiring to reduce the number of characters, also eyeball http://en.wikipedia.org/wiki/Diacritic + Korean characters maybe. – Mooing Duck Mar 12 '15 at 19:20

1 Answers1

1

You're in luck, there are 4096 contiguous ideographs starting at U+3400. Simply add 0x3400 to the value and take that Unicode character.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358