0

I am working on extracting values from a tab separated text file into a list in groovy. But am running into the ArrayIndexOutOfBoundsException.

Code

println("Reading File Contents")

def fullArray = new String[31721][4]
def availableArray = new String[1386][2]
def filteredFullArray = new String[1386][5]

String fileContents = new File('beliefs.txt').text
String availableContents = new File('available.txt').text

def count = 0

fileContents.eachLine { line ->

    String[] str
    str = line.split('\t')

    def subCount = 0
    for (subCount; subCount < str.length; subCount++) {
         fullArray[count][subCount] = str[subCount]
    }
    count++
}

beliefs.txt

1   Azerbaijan  hasOfficialLanguage Azerbaijani_language
2   Augustus    hasChild    Julia_the_Elder
3   Arthur_Aikin    isCitizenOf England
4   Arthur_Aikin    diedIn  London
5   Alexander_III_of_Russia isMarriedTo Maria_Feodorovna__Dagmar_of_Denmark_
6   Alexander_III_of_Russia hasChild    Nicholas_II_of_Russia
7   Alexander_III_of_Russia hasChild    Grand_Duke_Michael_Alexandrovich_of_Russia
8   Alexander_III_of_Russia hasChild    Grand_Duchess_Olga_Alexandrovna_of_Russia
9   Alexander_III_of_Russia hasChild    Grand_Duke_Alexander_Alexandrovich_of_Russia
10  Alexander_III_of_Russia hasChild    Grand_Duke_George_Alexandrovich_of_Russia
...
...
...
31719   Minqi_Li    isKnownFor  Chinese_New_Left
31720   Henry_Bates_Grubb   isKnownFor  Mount_Hope_Estate
31721   Thomas_Kuhn isKnownFor  Paradigm_shift  

Running this gives me the following error.

Caught: java.lang.ArrayIndexOutOfBoundsException: 4 java.lang.ArrayIndexOutOfBoundsException: 4 at extractBeliefs$_run_closure1.doCall(extractBeliefs.groovy:19) at extractBeliefs.run(extractBeliefs.groovy:12)

I am aware of the reason why the above error could occur. But since my array does not exceed the last index and since the error is shown to be at the line fileContents.eachLine { line ->, I am unable to find where this is going wrong.

Any suggestions in this regard will be highly appreciated.

Nayantara Jeyaraj
  • 2,624
  • 7
  • 34
  • 63
  • You've got an extra tab on one line. – Dawood ibn Kareem Dec 06 '18 at 06:19
  • One way you could find the line with the extra tab would be to import your file into Excel as tab-separated, then have a quick look down column E to see where there's text. – Dawood ibn Kareem Dec 07 '18 at 20:48
  • Thanks. But the problem is, it has approximately 23000 lines and hence, I am working on a method to automatically concatenate any text that are currently tab-separated and are intentionally not supposed to be. – Nayantara Jeyaraj Dec 10 '18 at 03:31
  • You could just increase the second dimension of `fullArray` so that it's big enough to store however many fields there are on the line that has the most. Or you could change the condition in the `for` loop to `subCount < str.length && subCount < 4`. – Dawood ibn Kareem Dec 10 '18 at 03:39

3 Answers3

2

Your initial error is coming from this line (19):

fullArray[count][subCount] = str[subCount]

Line 12 is just elevating the exception as it exits the closure. This definitely indicates you have an extra tab on one line... for debugging purposes, try printing the line to the console before you attempt to load it into the array. That'll help you identify which line has the error.

Trebla
  • 1,164
  • 1
  • 13
  • 28
-1

Try splitting with space

str = line.split('\s+')

instead of

str = line.split('\t')
  • Seems like `'\s+'` is not a valid in groovy as it gives me an error. Edit: If you meant to split based on multiple spaces which seems logical, I tried using `'\\s+'`, which didn't work as well – Nayantara Jeyaraj Dec 06 '18 at 06:32
-1

Better way would be to replace all Multispaces or tabs with the single space first and then split by single space.

line = line.replace("\\s+/g", " ")
str = line.split('\\s+')
Yugansh
  • 365
  • 3
  • 9
  • Thanks for the response. But what is `g` supposed to be in `/\s\s+/g`? It seems to give an error. Also I changed `('\s+')` to `('\\s+')` – Nayantara Jeyaraj Dec 06 '18 at 06:39
  • @Nayantara Jeyaraj the /g, it replaces all white-space characters (space, tab, \r, \n, \v \f) with space-character. gloabal search for the pattern – Yugansh Dec 06 '18 at 08:54
  • The above did not work. But according to @Dawood, there seems to be a few tab spaces and hence am modifying those. Seems easier – Nayantara Jeyaraj Dec 06 '18 at 10:15