0

I have a locally stored file, around 2.3MB in size, about 500 000 lines altogether and I would like to store it into a HashSet into memory. Since the file is large, and reading is so slow, I have split the file into 5 smaller ones, less than 100 000 lines each.
My idea is to instantiate 5 separate threads from the Application class. Each thread would read its own file and store data in its own set. Upon completion, it would return the obtained subset to the main thread, ie. to the Application class, which would then store in the main set. Thread code is as follows:

private class LoadFileThread extends Thread {
    private String filename;
    private Set<String> subSet;
    private MyApplication application;

    public LoadFileThread(String filename, MyApplication ctx) {
        this.filename = filename;
        this.application = ctx;
        this.subSet = new HashSet<String>();
    }

    @Override
    public void run() {
        AssetManager am = application.getAssets();
        BufferedReader reader = null;
        try {
            InputStream is = am.open(filename);
            reader = new BufferedReader(new InputStreamReader(
                is));
            String line = null;
            while ((line = reader.readLine()) != null) {
                subSet.add(line.toUpperCase());
            }        
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {reader.close();}catch (IOException ignorable) {}
        }
        application.setSubSet(subSet, this.getName());
    }

}

Method setSubSet in the Application class:

public synchronized void setSubSet(Set<String> subSet, String name) {
        myMainSet.addAll(subSet);
        Log.d("Thread finished", name);
    }

Two problems occur:

  1. Reading is still waaaaay to slow.
  2. I get an out of memory error when calling addAll on the main set.

Is there a better way to do this? How?

Maggie
  • 7,823
  • 7
  • 45
  • 66

1 Answers1

1

With 500,000 lines and readLine () you are doing 500,000 reads.

Create a 64k buffer and read into that.

Process each full line you can then read another 64k.

That should cut your reads into a fraction of 500,000

MikeHelland
  • 1,151
  • 1
  • 7
  • 17