0

I have an input folder with 1000's of files, each of them with 1000's of JSON records.

I created an Akka actor system to read these files and process them.

I'm reading the JSON from each file using:

        JSONParser parser = new JSONParser();
        Object object = parser.parse(new FileReader(file));
        JSONArray jsonArray = (JSONArray)object;
        Iterator<JSONObject> iterator = jsonArray.iterator();
        while (iterator.hasNext()) {
            JSONObject currentJsonObject = iterator.next();
            // send json object to another actor for further processing
        }

The initial design included a new "file reader" actor with the above code for each file in the folder.

It worked OK when I had only a few files in the folder.

When the number of files in the folder is large, the system is crashing with "OutOfMemory" exception. So it seems like all these "file reader" actors are trying to read all the files at the same time and load them to memory.

What will be a good approach to read these JSON files?

  • Akka Streams?

  • Only one "file reader" actor that reads them one by one?

riorio
  • 6,500
  • 7
  • 47
  • 100
  • 4
    Your FileReader stream is not closed, there is a memory leak there, also if you need to limit the number of actors: https://stackoverflow.com/questions/32635503/how-to-limit-the-number-of-actors-of-a-particular-type?rq=1 – pdem Mar 28 '18 at 09:20
  • I would suggest having some logic to handle `x` amount of files at a time, for that I would use another actor to fetch batches of files. It may also be possible your application is trying to open many files at the same time, thus running out of memory. This article is old but might help: https://manuel.bernhardt.io/2014/04/23/a-handful-akka-techniques/ – Juan Stiza Mar 28 '18 at 18:22
  • We are using `Akka Streams` (because of its capability of back pressuring) and it works like charm for us. We have file of size ~20GB. – Explorer Apr 06 '18 at 18:09

0 Answers0