7

I have used the below code to convert Charsequence to Byte Array. Then I save the Byte Array as Blob to my Sqlite Database.

For this , I have used the below code,

 public static byte[] toByteArray(CharSequence charSequence) {
        if (charSequence == null) {
          return null;
        }
        byte[] barr = new byte[charSequence.length()];
        for (int i = 0; i < barr.length; i++) {
          barr[i] = (byte) charSequence.charAt(i);
        }

        return barr;
      }

Now I would like to convert my byte array retrieved from sqlite to Charsequence. But I couldn't get any help on it.

How to convert Byte Array to Charsequence?

Any help is much appreciated.

Andro Selva
  • 53,910
  • 52
  • 193
  • 240
  • Is this ASCII only? Because if not, that conversion will lose data. – Thilo Aug 21 '12 at 09:38
  • `CharSequence` is an interface, so you need an actual implementation to put your byte array into... – brimborium Aug 21 '12 at 09:40
  • @Thilo No my firned. It is TSCII format. I am working for a Indic language app. Loss of data might affect my html sequence I believe. – Andro Selva Aug 21 '12 at 09:40
  • If you got a CharSequence in Android, it has already been transformed to Unicode (or is already broken). Why not use UTF-8 for everything in your system, and then (maybe, if really required) convert it to TSCII for import/export to whatever else you are running there? – Thilo Aug 21 '12 at 09:56

3 Answers3

21

To convert a CharSequence into a byte array

CharSequence seq;
Charset charset;
...
byte[] bytes = seq.toString().getBytes(charset);

To convert back again

CharSequence seq2 = new String(bytes, charset);

Just remember that CharSequence is an interface that is implemented by String, StringBuilder, StringBuffer, etc so all String instances are CharSequence instances but not all CharSequence instances are String but the contract for CharSequence is that its toString() method should return the equivalent String

Internally all strings in Java are represented as Unicode, so as long as the consumer and producer are both Java the safest charset to use is one of UTF-8 or UTF-16 depending on the likely encoding size of your data. Where Latin scripts predominate,

Charset charset = Charset.forName("UTF-8"); 

will 99.9% of the time give the most space efficient encoding, for non-latin character sets (e.g. Chinese) you may find UTF-16 more space efficient depending on the data set you are encoding. You would need to have measurements showing that it is a more space efficient encoding and as UTF-8 is more widely expected I recommend UTF-8 as the default encoding in any case.

Stephen Connolly
  • 13,872
  • 6
  • 41
  • 63
8

It looks like you are using ASCII data (if not, your code is quite lossy).

To get a CharSequence from ASCII bytes, you can do

CharSequence x = new String(theBytes, "US-ASCII");

For other encodings, just specify the name of the character set.

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • What If I want to use TSCII type? Is that possible? – Andro Selva Aug 21 '12 at 09:41
  • +1 I would use `ISO-8852-1` which is 8-bit characters whereas `US-ASCII` is technically 7-bit. http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html – Peter Lawrey Aug 21 '12 at 09:42
  • 1
    I am not sure if Java supports TSCII. Cannot you use UTF-8? Where do you get the data from (the original CharSequence that you wrote into the DB)? – Thilo Aug 21 '12 at 09:43
  • @AndroSelva I would try TSCII, but it might not be supported on all platforms. – Peter Lawrey Aug 21 '12 at 09:43
  • 1
    @PeterLawrey: But that won't match his "encoder" (which is just taking the first byte of every Unicode codepoint). Nothing will survive that except for 7-bit ASCII. – Thilo Aug 21 '12 at 09:44
  • @AndroSelva [Here you go](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html). There is a contructor `String(byte[], CharSet charset)` to get full control over the charset. – brimborium Aug 21 '12 at 09:44
  • @PeterLawrey I am doing this for Android. And android doesn't support inidc language. So I need to make use of TSCII type data. – Andro Selva Aug 21 '12 at 09:44
  • 1
    @AndroSelva You also have to make sure that your encoder (the code you posted above) converts the `CharSequence` to a `byte[]` using the encoding that you want to use. Now you're only casting `char`s to `byte`s. – Jesper Aug 21 '12 at 09:47
  • Are you sure Android supports `TSCII`? All Java based systems should support UTF-8 so I would try this first. – Peter Lawrey Aug 21 '12 at 09:48
  • @PeterLawrey: Re: "Nothing will survive that except for 7-bit ASCII". Actually, it seems that it will work for ISO/IEC 8859-1 which makes up the second block of Unicode code points: http://en.wikipedia.org/wiki/C1_Controls_and_Latin-1_Supplement – Thilo Aug 21 '12 at 09:54
  • @Thilo It may happen to work on some systems, or it may work as documented on some Android systems. I would go with the one which is documented to always work. ;) – Peter Lawrey Aug 21 '12 at 09:57
1
CharSequence c = new String(byte[]);
Anirudh Ramanathan
  • 46,179
  • 22
  • 132
  • 191
Dan
  • 1,030
  • 5
  • 12
  • 2
    Note that this will use the default character encoding of your system to interpret the bytes as characters. That may or may not be what you want. – Jesper Aug 21 '12 at 09:40
  • Fair point - you will need to let know the charset in the constructor if you are not using the default. Uboat for you sir! http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html – Dan Aug 21 '12 at 09:43