0

I am using openEdgar to parse SEC filings data and it uses Apache Tika to parse HTML, XML and LBRL content. I am running this on a box with 4G of memory and it keeps dying on me.

I ended up starting it this way:

java -Dlog4j.configuration=file:log4j.xml -jar tika-server-1.19.1.jar -spawnChild

and in the logs I see how it will eventually fail to ping the child process and things just go downhill from there and JVM will die with insufficient memory to proceed:

2018-12-20 19:17:29 DEBUG WriteFlusher:434 - Flushed=true written=32776 remaining=0 WriteFlusher@575678bd{WRITING}->null
2018-12-20 19:42:25 INFO  TikaServerCli:115 - Starting Apache Tika 1.19.1 server
2018-12-20 19:49:37 WARN  TikaServerWatchDog:191 - Exception pinging child process
    ...java.io.IOException: Stream closed
2018-12-20 19:49:37 WARN  TikaServerWatchDog:213 - Exception asking child to shutdown
    ...java.io.IOException: Stream closed
2018-12-20 19:49:37 WARN  TikaServerWatchDog:225 - Problem shutting down writer to child
   ...java.io.IOException: Stream closed
2018-12-20 19:49:37 INFO  TikaServerWatchDog:97 - About to restart the child process
2018-12-20 19:49:40 INFO  TikaServerWatchDog:99 - Successfully restarted child process -- 1 restarts so far)
2018-12-20 19:53:15 WARN  TikaServerWatchDog:197 - Received status from child: TIMEOUT
2018-12-20 19:53:20 WARN  TikaServerWatchDog:213 - Exception asking child to shutdown
    ...java.io.IOException: Stream closed
2018-12-20 19:53:20 WARN  TikaServerWatchDog:225 - Problem shutting down writer to child
    ...java.io.IOException: Stream closed
2018-12-20 19:53:20 INFO  TikaServerWatchDog:97 - About to restart the child process
2018-12-20 19:53:34 INFO  TikaServerWatchDog:99 - Successfully restarted child process -- 2 restarts so far)
2018-12-20 19:55:00 WARN  TikaServerWatchDog:202 - Exception receiving status from child
java.lang.ArrayIndexOutOfBoundsException: 35 is not acceptable for an array of length 6
2018-12-20 19:55:08 ERROR TikaServerCli:120 - Can't start: 
    java.io.IOException: Unrecognized status code; message:
    #
    # There is insufficient memory for the Java Runtime Environment to continue.
    # Native memory allocation (mmap) failed to map 65536 bytes for committing reserved memory.

Is there anything else I could do to understand the root cause of this and potentially fix it?

abolotnov
  • 4,282
  • 9
  • 56
  • 88
  • How much of the 4gb of memory are you giving to Java? From the command line shown, it looks like you're not assigning much of it... – Gagravarr Dec 22 '18 at 03:39
  • I tried giving it 1G to start and 2G max heap with -Xms[x] options, but it doesn't change anything. I could try giving it 32G or whatever, but the problem is that it will eventually eat up all the memory and 'kill' the instance. Is this a normal thing for JVM? – abolotnov Dec 24 '18 at 04:03
  • 1
    To specify heap for the child process, you have to prepend -J as in, -JXmx2g. To get at the root cause, what do the logs look like from the child process (e.g. -JDlog4j.configuration=file:log4j.xml)? – Tim Allison Jan 08 '19 at 18:02

0 Answers0