Why are foreign characters not read using inputStream?

Question

I have a text file which contains data I need to preload into a SQLite database. I saved in in res/raw.

I read the whole file using readTxtFromRaw(), then I use the StringTokenizer class to process the file line by line.

However the String returned by readTxtFromRaw does not show foreign characters that are in the file. I need these as some of the text is Spanish or French. Am I missing something?

Code:

String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");

The readTxtFromRaw method is:

private String readTxtFromRaw(Integer rawResource) throws IOException
{
    InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

    int i = inputStream.read();
    while (i != -1)
    {
        byteArrayOutputStream.write(i);
        i = inputStream.read();
    }
    inputStream.close();

    return byteArrayOutputStream.toString();
}

The file was created using Eclipse, and all characters appear fine in Eclipse.

Could this have something to do with Eclipse itself? I set a breakpoint and checked out myToken in the Watch window. I tried to manually replace the weird character for the correct one (for example í, or é), and it would not let me.

score 1 · Accepted Answer · answered Jun 04 '11 at 20:22

1

Have you checked the several encodings?

what's the encoding of your source file?
what's the encoding of your output stream?

the byteArrayOutputStream.toString() converts according to the platform's default character encoding. So I guess it will strip the foreign characters or convert them in a way that they are not displayed in your output.

Have you already tried to use byteArrayOutputStream.toString(String enc)? Try "UTF-8" or "iso-8859-1" or "UTF-16" for the encoding.

answered Jun 04 '11 at 20:22

rdmueller

10,742
10
69
126

1

Actually I right clicked the file in the Eclipse Package Explorer, selected Properties, and there was a Text File encoding option. Selected UTF-8 and everything is working ok. Thanks for this. – Sandy Jun 04 '11 at 20:25
Is it the same to use a ByteArrayOutputStream, as it is to use something like this: BufferedReader reader = new BufferedReader(new InputStreamReader(mCtx.getResources().openRawResource(rawResource))); ?? Both seem to work now, but I don't know if one is better for this than another. – Sandy Jun 04 '11 at 20:28
1

I guess both ways are ok, but the BufferedReader looks better :-) I guess it will handle the encoding in a cleaner way. – rdmueller Jun 04 '11 at 20:34

Why are foreign characters not read using inputStream?

1 Answers1