3

As I know a is an 8 bits character, â is a 16 bits character:

  1. How to know a character is 8 bits or 16 bits or higher?

  2. Why â character could not present at 8 bits?

  3. a or â just UI form, how do they look like in bits form?

  4. 97 is the code of a, how to calculate this number or it's just the ordinal number of character?

giliev
  • 2,938
  • 4
  • 27
  • 47
Hoang Nguyen
  • 61
  • 1
  • 13
  • 4
    You have a lot of assumptions wrong, the first of which is that 'a' is "8 bits". It is not. A `char` in Java is 16 bits, always -- and it's a UTF-16 code unit, to be precise. – fge Sep 16 '15 at 16:49

1 Answers1

5

As I know 'a' is an 8 bits character, 'â' is a 16 bits character.

Not really. Java char is an unsigned 16-bit type, so both 'a' and 'â' are 16-bit characters. It is true that 'a''s top 8 bits are set to zero, but these bits are there nevertheless. Same goes for 'â' (see below).

How to know a character is 8 bits or 16 bits or higher?

Compare ch & 0xFF00 to zero. If it is zero, the upper 8 bits are all zeros; otherwise, some of these eight bits are non-zeros.

Why 'â' character could not present at 8 bits?

It can be presented as using 8-bit: 'â''s code is 0xE2, or 226. It fits in 8 bits, but it does not fit in 7 bits. Here is a convenient table for looking up character codes.

'a' or 'â' just UI form, how do they look like in bits form?

Since char is an integral type, you can convert it to int and print them in binary, decimal, hex or other radix to see the bit patterns behind the character representations.

97 is the code of 'a', how to calculate this number or it's just the ordinal number of character?

Cast 'a' to an int:

int a = (int)'a';
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Cast `'a'` to an `int`, I have known that, but I mean what is the meaning of 97 how does it differ with 97 in match? – Hoang Nguyen Sep 16 '15 at 17:01
  • 3
    @HoangNguyen 97 is simply a number that gets *interpreted* as a character `'a'` when it is set to a variable of type `char`. Apart from this interpretation, there is no difference from ninety-seven-the-int and ninety-seven-the-char. – Sergey Kalinichenko Sep 16 '15 at 17:04
  • Sorry for the inconvinient, do `'a'` in `UTF-8` and `'a'` in `UTF-16` have the same code 97? – Hoang Nguyen Sep 16 '15 at 17:25
  • @HoangNguyen Yes, all printable characters that can be encoded as a single byte in UTF-8 have the same lower byte and zero upper byte in UTF-16. – Sergey Kalinichenko Sep 16 '15 at 17:30
  • @dasblinkenlight, my number two question. To know a character is `8bits` or not, could I `cast char to int` to get the code and then compare the code with the maximum of `8bits`(256) if the code <= 256 it is `8bits` char else it is not. And I will do so with `16bits`, `32bits` etc... – Hoang Nguyen Sep 17 '15 at 02:21
  • @HoangNguyen There would be no 32bits, as `char` is only 16 bits. Comparing to 256 is as good as masking with `0xff00`, use whichever you prefer. – Sergey Kalinichenko Sep 17 '15 at 02:26
  • @dasblinkenlight, "There would be no 32bits" so how about UTF-32 I am understading that 32 is 32bits, am I wrong? – Hoang Nguyen Sep 17 '15 at 03:37
  • @HoangNguyen Java's `char` is fixed size, it's 16 bits, so as far as Jaca `char` goes, there is no UTF-8 or UTF-32. – Sergey Kalinichenko Sep 17 '15 at 09:35