1

I'm trying to create a program which generates a word list based on a couple (10-100) original input words. The end result contains millions, possibly billions of lines, with one word on each line. I've come far enough that I can generate up to about 5 million or so words, but whenever I run something that would generate far more words, like 100 million or so, the program crashes after roughly 1 min and 9 seconds. Here is the error output:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3210)
    at java.util.Arrays.copyOf(Arrays.java:3181)
    at java.util.ArrayList.grow(ArrayList.java:265)
    at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
    at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
    at java.util.ArrayList.add(ArrayList.java:462)
    at wordlistgen.WordlistGen2.combineWords(WordlistGen2.java:129)
    at wordlistgen.WordlistGen2.main(WordlistGen2.java:25)
    /home/NAME/.cache/netbeans/8.1/executor-snippets/run.xml:53: Java 
returned: 1
BUILD FAILED (total time: 1 minute 9 seconds)

I have tried to increase the heap size for Netbeans by entering -J-Xms1024m -J-Xmx2048m in my netbeans.conf file (Running Ubuntu 17.10), but the error persists.

Essentially what the program does is import the original 10-100 words:

static void importList() throws IOException{
    ArrayList<String> rawList = new ArrayList<>();

    try(BufferedReader br = new BufferedReader(new FileReader("textfile"))) {
        for(String line; (line = br.readLine()) != null; ) {
            rawList.add(line);
        }

        listOfLists.add(rawList);
        loll++;
    }

}

Then, with a bunch of for loops I create new variations of words with capitalized letters, numbers at the end, substrings of the entire word, and so on. The words are stored in different arraylists, which are in turn stored in an ArrayList of ArrayLists. So in an ArrayList.

When I'm done combining and manipulating words, I output the entire final arraylist, line by line, to an output file, using the following method:

static void outputFile(String fileName) throws IOException{
    try (FileWriter writer = new FileWriter(fileName)) {
        for(String str: finalList) {
            writer.write(str +"\n");
        }
    }
}

The entire code can be found here: https://pastebin.com/0fkvwYbx

I'm hoping that I'm missing something obvious, or that I've misinterpreted the error message, either way, if someone could find a solution so that I am able to generate longer lists, I'd be very grateful.

Trish
  • 41
  • 1
  • 5
  • Can't you just write to the file when generating the words? Or a bunch of words? I mean, is there actually an advantage to keeping them in memory? – Federico klez Culloca Apr 13 '18 at 14:19
  • 1
    I think you are being punished by the programing gods for trying to crack passwords. – bhspencer Apr 13 '18 at 14:20
  • You're giving it up to 2G of memory and generating 100M things. Unless each of those things takes less than 20 bytes, sure, you'll run out of memory. Note that all Java objects occupy at least 12 bytes, on top of any references to those objects (including space for references to the objects, like the backing array of an ArrayList). So, running out of memory sounds entirely reasonable. – Andy Turner Apr 13 '18 at 14:20
  • Perhaps if you share the code that is generating the password variations we might be able to help. – bhspencer Apr 13 '18 at 14:24
  • Tried allocating 4G of RAM instead of 2. The time it took for the error to occur should, in my mind, have doubled. But it's still at 1 min 9 sec, would that not mean something else is causing the issue? Ping Andy :) – Trish Apr 13 '18 at 14:26

1 Answers1

0

Maybe ArrayList is not the appropiate List implementation for your problem. Please see: When to use LinkedList over ArrayList?

I think you are constantly hitting the worst-case scenario when (citing)

add(E element) is O(1) amortized, but O(n) worst-case since the array must be resized and copied

Not only inefficient in time, but also in memory, since you are constantly needing duplicated huge backing arrays for your ArrayLists. Consider using LinkedList, specially since your code does not appear to do random access by index to the lists

Jorge_B
  • 9,712
  • 2
  • 17
  • 22