1

I have a text file which contains data I need to preload into a SQLite database. I saved in in res/raw.

I read the whole file using readTxtFromRaw(), then I use the StringTokenizer class to process the file line by line.

However the String returned by readTxtFromRaw does not show foreign characters that are in the file. I need these as some of the text is Spanish or French. Am I missing something?

Code:

String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");

The readTxtFromRaw method is:

private String readTxtFromRaw(Integer rawResource) throws IOException
{
    InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

    int i = inputStream.read();
    while (i != -1)
    {
        byteArrayOutputStream.write(i);
        i = inputStream.read();
    }
    inputStream.close();

    return byteArrayOutputStream.toString();
}

The file was created using Eclipse, and all characters appear fine in Eclipse.

Could this have something to do with Eclipse itself? I set a breakpoint and checked out myToken in the Watch window. I tried to manually replace the weird character for the correct one (for example í, or é), and it would not let me.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sandy
  • 2,572
  • 7
  • 40
  • 61

1 Answers1

1

Have you checked the several encodings?

  • what's the encoding of your source file?
  • what's the encoding of your output stream?

the byteArrayOutputStream.toString() converts according to the platform's default character encoding. So I guess it will strip the foreign characters or convert them in a way that they are not displayed in your output.

Have you already tried to use byteArrayOutputStream.toString(String enc)? Try "UTF-8" or "iso-8859-1" or "UTF-16" for the encoding.

rdmueller
  • 10,742
  • 10
  • 69
  • 126
  • 1
    Actually I right clicked the file in the Eclipse Package Explorer, selected Properties, and there was a Text File encoding option. Selected UTF-8 and everything is working ok. Thanks for this. – Sandy Jun 04 '11 at 20:25
  • Is it the same to use a ByteArrayOutputStream, as it is to use something like this: BufferedReader reader = new BufferedReader(new InputStreamReader(mCtx.getResources().openRawResource(rawResource))); ?? Both seem to work now, but I don't know if one is better for this than another. – Sandy Jun 04 '11 at 20:28
  • 1
    I guess both ways are ok, but the BufferedReader looks better :-) I guess it will handle the encoding in a cleaner way. – rdmueller Jun 04 '11 at 20:34