1

Ok..so I am doing a program on NLP. It uses function eliminateStopWords(). This function reads from a 2D array "sentTokens" (of detected tokens). In the code below, index i is sentence number, j is for each token in the ith sentence.

Now, what my eliminateStopWords() does is this:

  1. it reads stop words from a text file and stores them in a TreeSet

  2. reads tokens from sentTokens array and checks them for stop words. If they are collocations, then they should not be checked for stop words, they are just dumped into a finalTokens array. If they are not a collection, then they are individually checked for stop words and are added to finalTokens array only if they are not stop words.

The problem comes in the loop of this step 2. Here is some code of it: (I have marked // HERE at the location where the error actually occurs... it's near the end)

private void eliminateStopWords() {

    try {

        // Loading TreeSet for stopwords from the file.
        stopWords = new TreeSet<String> ();
        fin = new File("stopwords.txt");
        fScan = new Scanner(fin);
        while (fScan.hasNextLine()) 
            stopWords.add(fScan.nextLine());

        fScan.close();

        /* Test code to print all read stopwords
        iter2 = stopWords.iterator();
        while (iter2.hasNext())
            System.out.println(iter2.next()); */

        int k=0,m=0;    // additional indices for finalTokens array
        System.out.println(NO_OF_SENTENCES);

 newSentence: for(i=0; i < NO_OF_SENTENCES; i++)
          {

        System.out.println("i = " + i);
            for (j=0; j < sentTokens[i].length; j+=2)
            {

        System.out.println("j = " + j);

                // otherwsise, get two successive tokens
                    String currToken = sentTokens[i][j];
                    String nextToken = sentTokens[i][j+1];
                    System.out.println("i = " + i);
                    System.out.println(currToken + " " + nextToken);
                    if ( isCollocation(currToken, nextToken) ) {    
// if the current and next tokens form a bigram collocation, they are not checked for stop words
                        // but are directly dumped into finalTokens array
                        finalTokens[k][m] = currToken; m++;
                        finalTokens[k][m] = nextToken; m++;
                    }

                    if ( !stopWords.contains(currToken) )
                    {   finalTokens[k][m] = currToken; m++;  }

                    if ( !stopWords.contains(nextToken) )
                    {       finalTokens[k][m] = nextToken; m++; }


                // if current token is the last in the sentence, do not check for collocations, only check for stop words
                // this is done to avoid ArrayIndexOutOfBounds Exception in sentences with odd number of tokens

// HERE
                    System.out.println("i = " + i);

                    if ( j==sentTokens[i].length - 2) {
                    String lastToken = sentTokens [i][++j];
                    if (!stopWords.contains(lastToken))
                    {  finalTokens[k][m] = lastToken; m++; }

                    // after analyzing last token, move to analyzing the next sentence

                    continue newSentence;

                    }
            }

            k++;    // next sentence in finalTokens array
        }

        // Test code to print finalTokens array
           for(i=0; i < NO_OF_SENTENCES; i++) {
               for (j=0; j < finalTokens[i].length; j++) 
                   System.out.print( finalTokens[i][j] + " " );

               System.out.println();
           }



    }
        catch (Exception e) {
            e.printStackTrace();
        }
}

I have printed the indices i & j at the entry of their respective for loops...it all works fine for the first iteration of the loop, but when the loop is about to reach its end... I have printed again the value of 'i'. This time it comes out as 14.

  • it starts the first iteration with 0...
  • does not get manipulated anywhere in the loop...
  • and just by the end of (only) first iteration, it prints the value as 14

I mean this is seriously the WEIRDEST error I have come across ever while working with Java. It throws up an ArrayIndexOutOfBoundsException just before the final if block. It's like MAGIC. You do nothing on the variable in the code, still the value changes. HOW CAN THIS HAPPEN?

Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
Navin Israni
  • 1,327
  • 3
  • 15
  • 27

1 Answers1

4

You never declared i or j in your code, which leads me to believe that they are fields.

I'm pretty sure that some of your other methods re-use those variables and thus mess with your result. isCollocation looks like a candidate for that.

The counters in for loops should always be local variables, ideally declared inside the for statement itself (for minimal scope). Everything else is just asking for trouble (as you see).

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • +1: Never use fields for loop counters. Only use fields for data you want to share between methods or between method calls. Any field which can be turned into a local variable, you should do so. Any field you can make final, do so and be careful with the use of any mutable fields. – Peter Lawrey Apr 14 '11 at 10:31
  • well...i actually use a lot of loops..so to avoid any sort of multiple declaration errors with loop counters...i declare i & j globally as loop counters.. – Navin Israni Apr 14 '11 at 10:44
  • isCollocation might be doing it....but the same error lead to an AIOOBE before when that final if-block is placed before the call to isCollocations...even in that "before" i printed the value of 'i' just prior to the error line...there i=0...and the right next line (which is the error line) causes AIOOBE at i=14?? – Navin Israni Apr 14 '11 at 10:47
  • ok yes thanks @joachim ...isCollocation was not actually doing it itself..but it calls another method loadMapfromFile() which actually does the mischief with 'i'... – Navin Israni Apr 14 '11 at 11:00
  • 1
    @Navin Israni: Would you rather have "multiple declaration errors with loop counters" (which can be detected at compile time and fixed easily), or strange hard-to-debug runtime problems caused by shared loop counters? I know which I would choose. – Greg Hewgill Apr 14 '11 at 11:06
  • 1
    @greg yea...i guess experience with weird errors like these only teaches you better programming... – Navin Israni Apr 14 '11 at 11:16
  • thanks @joachim...i had multiple errors after i corrected for which i asked the question...and guess what...all of them were beacuse of the same mistake...making loop counters global...i'll surely take care of it from now on... – Navin Israni Apr 14 '11 at 12:22