Whats the difference between getBytes("UTF-8"), getBytes("windows-1252") and getBytes()?

Question

I have following code which produces confusing output..

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;

    public class Main {

        String testString = "Moage test String";

        public static void main(String[] args) {
            new Main();
        }

        public Main(){

            System.out.println("Default charset: "+Charset.defaultCharset());
            System.out.println("Teststring: "+testString);
            System.out.println();
            System.out.println("get the byteStreeam of the test String...");
            System.out.println();
            System.out.println("Bytestream with default encoding: ");
            for(int i = 0; i < testString.getBytes().length; i++){
                System.out.print(testString.getBytes()[i]);
            }
            System.out.println();
            System.out.println();
            System.out.println("Bytestream with encoding UTF-8: ");
            try {
                for(int i = 0; i < testString.getBytes("UTF-8").length; i++){
                    System.out.print(testString.getBytes("UTF-8")[i]);
                }
                System.out.println();
                System.out.println();
                System.out.println("Bytestream with encoding windows-1252 (default): ");

                for(int i = 0; i < testString.getBytes("windows-1252").length; i++){
                    System.out.print(testString.getBytes("windows-1252")[i]);
                }

                System.out.println();
                System.out.println();
                System.out.println("Bytestream with encoding UTF-16: ");

                for(int i = 0; i < testString.getBytes("UTF-16").length; i++){
                    System.out.print(testString.getBytes("UTF-16")[i]);
                }

            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            }
        }
    }

So I wanted to see the difference between utf-8 encoding and windows-1252. But when I look at the output, it seems like there is no difference. Only when I cdompare windows-1252 with utf-16 there is a difference.

Output:

> Default charset: windows-1252 Teststring: Moage test String
> 
> get the byteStreeam of the test String...
> 
> Bytestream with default encoding: 
> 7711197103101321161011151163283116114105110103
> 
> Bytestream with encoding UTF-8: 
> 7711197103101321161011151163283116114105110103
> 
> Bytestream with encoding windows-1252 (default): 
> 7711197103101321161011151163283116114105110103
> 
> Bytestream with encoding UTF-16: 
> -2-1077011109701030101032011601010115011603208301160114010501100103

Can anyone explain me why utf-8 and windows-1252 looks the same?

Cheers Alex

Put some special characters into your test string. Your current test data does not cover the differences between the charsets. — f1sh, Apr 28 '16 at 08:33
uaaa thank you!! I never thought that this would change something. :) Now there are some differences. — Mansouritta, Apr 28 '16 at 08:42

score 3 · Answer 1 · answered Apr 28 '16 at 08:32

3

This is because you only use ASCII characters in your test String which is in your case "Moage test String", try with special characters such as "éèà" for example, you will then see different results.

answered Apr 28 '16 at 08:32

Nicolas Filotto

43,537
11
94
122

score 0 · Answer 2 · answered Apr 28 '16 at 08:44

Here,

You used the String Characters of which belongs to the Range of ASCII. If your string contains any special Character or language which supports Special Character You Bytes Output will be changed.

UTF-8 is generally approved standard, which works everywhere. But , Windows-any encoding is Windows-specific and not guaranteed to work on any machine.

Whats the difference between getBytes("UTF-8"), getBytes("windows-1252") and getBytes()?

2 Answers2