0

How can I create a variable character that can hold a four byte value?

I am trying to write an program to encrypt messages in java, for fun. I figured out how to use RSA, and managed to write a program that will encrypt a message and save it to a .txt file. For example if "Quiet" is entered the outcome will be "041891090280". I wrote my code so that the number would always have length that is a multiple of six. So I thought that I could convert the numbers into a hash code. The first three letters are "041" so I could convert that into ")".

However I am having trouble created a char with a number greater than 255. I have looked around online and found a few examples, but I can't figure out how to implement them. I created a new method just to test them.

int a = 256;
char b = (char) a;
char c = 0xD836;
char[] cc = Character.toChars(0x1D50A);
System.out.println(b);
System.out.println(c); 
System.out.println(cc);

The program outputs

?

?

?

I am only getting two bytes. I read that Java uses Unicode which should go up to 65535 which is four bytes. I am using eclipse if that makes a difference.

I apologize for the noob question. And thanks in advance.

edit I am sorry, I think I gave too much information and ended up being confusion. What I want to do is store a string of numbers as a string of unicode characters. the only way I know how to do that is to break up the number string small enough to fit it into a character. then add the characters one by one to a new string. But I don't know how to add a variable unicode character to a string.

  • chars in Java don't encode any characters but only the ones of the basic multilingual plane. – Denys Séguret Jun 04 '13 at 16:28
  • See http://stackoverflow.com/questions/13112435/how-does-java-store-utf-16-characters-in-its-16-bit-char-type – Denys Séguret Jun 04 '13 at 16:30
  • Sorry I realized I said 32bit in the tile when I sould have said 16bit. – user2452405 Jun 04 '13 at 16:30
  • 1
    Also, while playing with the code is fun, there are plenty of encryption methods with strong algorithms... I would suggest using one of these instead of writing your own. But that's just because I don't like to code for nothing :-) – Laurent S. Jun 04 '13 at 16:30
  • A `char` represents only 16 bits. Java represents larger Unicode code points either with `int`s or with a pair of `char`s in a larger `String`. – Louis Wasserman Jun 04 '13 at 16:37
  • Thanks for the quick reply guys. Umm... "The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane" - dystroy. If java supports that BMP then I should be able to use 4 bytes right? **edit** you guys are posting faster than I can read – user2452405 Jun 04 '13 at 16:43
  • @user2452405 I'm a little confused by your question and the 32 bits thing, as well as the "only two bytes". – Denys Séguret Jun 04 '13 at 16:46
  • Sorry the 32bit thing was a mistake. The largest value I have been able to put in a char was 255, two bytes. – user2452405 Jun 04 '13 at 17:04

1 Answers1

0

All chars are 16-bit already. 0 to 65535 only need 16-bit and 2^16 = 65536.

Note: not all characters are valid and in particular, 0xD800 to 0xDFFF are used for encoding code points (characters beyond 65536)

If you want to be able to store all possible 16-bit values I suggest you use short instead. You can store the same values but it may be less confusing to use.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • I have tried to save 256 as a short (a) then create a char c = (char) a; but it returns a "?" as do all chars created with a value above 255. Can you be so kind as to show a noob an example of this working. I have tried to create a char using three different methods, but I cannot get any of them to work. – user2452405 Jun 05 '13 at 00:39
  • When you display a character, your display encoding and font comes into play. Whether you can display a character or not is no indication that the character was stored correctly. You may have to use a different font. It is possible there is no font on your system which will display every possible char values and many are not really characters by definition. I suggest you display the characters as 4 byte hexidecimal. This will work for all characters. – Peter Lawrey Jun 05 '13 at 05:47
  • Note: some characters are invisible when displayed correctly and character \u202e causes all the characters after it to be displayed in reverse order. I belive this is not going to work for you are you are assuming that for every possible char value there is a simple, universal displayed representation where this is not the case. – Peter Lawrey Jun 05 '13 at 05:48
  • 1
    @ Peter Lawrey Thanks I had't considered that. Sorry for dragging this out, but I did some digging and found out unicode has a massive block of defined Chinese ideograms (4e00-9faf). I only want to store a three digit number so I can offset the base number by 4e00 and use 4e00 to 51e8 – user2452405 Jun 05 '13 at 16:27