JobTracker - High memory and native thread usage

Question

We are running hadoop on GCE with HDFS default file system, and data input/output from/to GCS.

Hadoop version: 1.2.1 Connector version: com.google.cloud.bigdataoss:gcs-connector:1.3.0-hadoop1

Observed behavior: JT will accumulate threads in waiting state, leading to OOM:

2015-02-06 14:15:51,206 ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed:
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:714)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
        at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.initialize(AbstractGoogleAsyncWriteChannel.java:318)
        at com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.create(GoogleCloudStorageImpl.java:275)
        at com.google.cloud.hadoop.gcsio.CacheSupplementedGoogleCloudStorage.create(CacheSupplementedGoogleCloudStorage.java:145)
        at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.createInternal(GoogleCloudStorageFileSystem.java:184)
        at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.create(GoogleCloudStorageFileSystem.java:168)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:77)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:655)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:545)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:452)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:444)
        at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:1860)
        at org.apache.hadoop.mapred.JobInProgress$3.run(JobInProgress.java:709)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:706)
        at org.apache.hadoop.mapred.JobTracker.initJob(Jobenter code hereTracker.java:3890)
        at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

After looking through the JT logs I found these warnings:

2015-02-06 14:30:17,442 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #0 from primary datanode xx.xxx.xxx.xxx:50010
java.io.IOException: Call to /xx.xxx.xxx.xxx:50020 failed on local exception: java.io.IOException: Couldn't set up IO streams
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150)
        at org.apache.hadoop.ipc.Client.call(Client.java:1118)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy10.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:414)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:392)
        at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:201)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3317)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2200(DFSClient.java:2783)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2987)
Caused by: java.io.IOException: Couldn't set up IO streams
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
        at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
        at org.apache.hadoop.ipc.Client.call(Client.java:1093)
        ... 9 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:714)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:635)
        ... 12 more

This appears to be similar to hadoop bug reporter here: https://issues.apache.org/jira/browse/MAPREDUCE-5606

I tried proposed solution by disabling saving job logs into the output path and it solved the problem at the expense of missing logs :)

I also ran jstack on JT and it showed hundreds of WAITING or TIMED_WAITING threads as such:

pool-52-thread-1" prio=10 tid=0x00007feaec581000 nid=0x524f in Object.wait() [0x00007fead39b3000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x000000074d86ba60> (a java.io.PipedInputStream)
        at java.io.PipedInputStream.read(PipedInputStream.java:327)
        - locked <0x000000074d86ba60> (a java.io.PipedInputStream)
        at java.io.PipedInputStream.read(PipedInputStream.java:378)
        - locked <0x000000074d86ba60> (a java.io.PipedInputStream)
        at com.google.api.client.util.ByteStreams.read(ByteStreams.java:181)
        at com.google.api.client.googleapis.media.MediaHttpUploader.setContentAndHeadersOnCurrentReque
st(MediaHttpUploader.java:629)
        at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.
java:409)
        at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(Abstr
actGoogleClientRequest.java:419)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(Abstr
actGoogleClientRequest.java:343)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogl
eClientRequest.java:460)
        at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.run(AbstractGo
ogleAsyncWriteChannel.java:354)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
   Locked ownable synchronizers:
        - <0x000000074d864918> (a java.util.concurrent.ThreadPoolExecutor$Worker)

It appears JT is having hard time keeping up communicating with GCS via GCS Connector.

Please advise,

Thank you

Do you happen to know from where you fetched this gcs-connector-1.3.0-hadoop1.jar? Can you verify your gcs-connector version with "hadoop fs -stat gs://foo"? It should print something like "15/02/10 18:16:13 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.3.0-hadoop1". — Dennis Huo, Feb 10 '15 at 18:18
also, we are using gcs-connector installed by bdutil `-rw-r--r-- 1 root root 4451217 Jun 6 2014 gcs-connector-1.2.6-hadoop1.jar` — ichekrygin, Feb 10 '15 at 18:27
Was the cluster itself also deployed using `bdutil` or did you manually install the gcs-connector onto an existing cluster? — Dennis Huo, Feb 10 '15 at 19:20
Sorry for so many questions, trying to narrow it down. 1. Any chance you still have the full jstack output for the JT? 2. How long does it normally take before the JT starts running out of threads? 3. Approximately how many submitted, running, finished, etc., jobs were there when the threads were starting to run out? 4. What's the output of "ulimit -a" on your JT VM? 5. How frequently are jobs submitted? 6. How many jobs do you need to be able to queue up on average? — Dennis Huo, Feb 10 '15 at 20:23
2. It took about a little over an hour (which was unusual on its own) since our last re-deploy. Typically we have to cycle JT once a week due to high mem.usage alerts. — ichekrygin, Feb 11 '15 at 19:04
3. We typically run about: 50-60 jobs per hour. I ran thread count right from the start and observed it was growing with every job (i.e. thread count was not going down) — ichekrygin, Feb 11 '15 at 19:08
4. ulimit -a: data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 100000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 10000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited — ichekrygin, Feb 11 '15 at 19:10
5. typically every minute or so (see 3. 50-60 jobs per hour) — ichekrygin, Feb 11 '15 at 19:10
6. jobs are typically spread out trhough the hour, but at any given time there could be 0-10 queued up jobs (due to available mappers/reducers) — ichekrygin, Feb 11 '15 at 19:11
Could you post the full contents of your complete jstack? Or email to gcp-hadoop-contact@google.com if you prefer. — Dennis Huo, Feb 11 '15 at 23:51

Dennis Huo · Accepted Answer · 2015-02-12T00:21:09.467

At the moment, every open FSDataOutputStream in the GCS connector for Hadoop consumes a thread until it's closed, because a separate thread needs to run the "resumable" HttpRequests while the user of the OutputStream writes bytes intermittently. In most cases, (such as in individual Hadoop tasks), there's only ever one long-lived output stream, and possibly a few shorter-lived ones for writing small metadata/marker files, etc.

In general, there are two possible causes for the OOM you're running into:

You have lots of queued up jobs; every submitted job holds an unclosed OutputStream, and thus consumes a "waiting" thread. However, since you mention you only need to queue up ~10 jobs, this shouldn't be the root cause.
Something is causing a "leak" of the PrintWriter objects, originally created in logSubmitted and added to fileManager. Typically, terminal events (like logFinished will correctly close() all the PrintWriters before removing them from the map via markCompleted, but in theory they may be bugs here or there which can cause one of the OutputStreams to leak without being close()'d. For example, while I haven't had a chance to verify this assertion, it seems that IOException trying to do something like logMetaInfo will "removeWriter" without closing it.

I've verified that at least under normal circumstances, the OutputStream seem to get closed correctly, and my sample JobTracker shows a clean jstack after having successfully run a lot of jobs.

TL;DR: There are some working theories as to why some resource may leak and ultimately prevent necessary threads from being created. You should consider changing hadoop.job.history.user.location to some HDFS location in the meantime, as a way to preserve the job logs in the absence of placing them on GCS.

JobTracker - High memory and native thread usage

1 Answers1