This should be straight forward but for some reason when I try to count words in a file after I download it to my SD Card, the number seems to be off. Also the more occurrences there are, the further my result seems to be off. I use Microsoft Word to verify the number of occurrences (using ignore case and whole word only). To test the number of occurrences, I use the "the_counter" variable below. I also verified there is nothing wrong with download & the FULL file is downloaded to my SD card. This is driving me nuts -- I'm thinking Word cannot be wrong here so what could possibly be wrong with my code below?
Could it be white space or special chars in the file causing the problem --is there a way to clean the file to verify this?
//Find the directory for the SD Card using the API
File sdcard = Environment.getExternalStorageDirectory();
//Get the text file
File file = new File(sdcard,TEMP_FILE);
//Read text from file
//StringBuilder text = new StringBuilder();
m_tree = new Tree();
int i=0;
BufferedReader br = null;
long the_counter=0;
try {
br = new BufferedReader(new FileReader(file));
String line;
String []arLine;
while ((line = br.readLine()) != null) {
//get each word in line
if(line.length()==0)
continue;
arLine = line.split("\\s+");
//now add each word to search tree
for(i=0;i< arLine.length;++i){
m_tree.insert(arLine[i]);
if(arLine[i].equalsIgnoreCase("a"))
++the_counter;
}
}
m_sTest = Long.toString(the_counter) ;
br.close();
I edited my code to read in each character per line and create words manually. and I STILL GET THE SAME RESULT.
br = new BufferedReader(new FileReader(file));
String line;
String []arLine;
StringBuilder word = new StringBuilder();
while ((line = br.readLine()) != null) {
//check for word at end of last line
if(word.length()>0){
m_tree.insert(word.toString());
word.setLength(0);
}
char[] lineChars = new char [line.length()];
line.getChars(0,line.length(),lineChars,0);
for(char c: lineChars){
if(c== ' '){
//if we have a word then store and clear then move on
if(word.length()>0){
m_tree.insert(word.toString());
word.setLength(0);
}
}
else{
word.append(c);
}
}