0

I have a problem with Java failing sometimes when running PySpark in Jupyter Notebook on Ubuntu. What I want is to see the error from the Java side because all I can see is usually very long general error of Python that can be summarized with this:

ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/py4j/java_gateway.py", line 1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

This error can mean a lot of things and it does not help at all. Usually it means that Java crashed but I want to know why exactly.

The example why I need these logs is that for example I tried to run Rapids on PySpark on DGX-1 machine and it ends in Java crashes like abo when initializing Spark Context. This is not the only reason for these errors but this code easily causes these errors on my side.

import pyspark
import os
cudf = "cudf-0.17-cuda10-1.jar"
rapids = "rapids-4-spark_2.12-0.2.0.jar"
script = "getGpuResources.sh"
separator = ","
conf = pyspark.SparkConf()
conf.set("spark.jars",cudf + "," + rapids)
conf.set("spark.plugins","com.nvidia.spark.SQLPlugin")
conf.set("spark.driver.memory","48g")
conf.set("spark.executor.memory","48g")
conf.set("spark.driver.cores","80")
conf.set("spark.executor.cores","80")
conf.set("spark.task.cpus","80")
conf.set("spark.dynamicAllocation.enabled","false")
conf.set("spark.rapids.sql.concurrentGpuTasks","8")
conf.set("spark.dynamicAllocation.enabled","false")
conf.set("spark.sql.extensions","ai.rapids.spark.Plugin")
conf.set("spark.driver.resource.gpu.amount","8")
conf.set("spark.driver.resource.gpu.discoveryScript",script)
conf.set("spark.executor.resource.gpu.amount","8")
conf.set("spark.executor.resource.gpu.discoveryScript",script)
conf.set("spark.task.resource.gpu.amount","8")
sc = pyspark.SparkContext(appName="rapids", conf = conf)

My question: Is there a way to catch somehow stdout of the Java process that is running by PySpark(using pyspark/jupyter/Ubuntu) to know the real reason of Java crash?

Tomasz
  • 658
  • 1
  • 7
  • 22

1 Answers1

0

So this is going to depend on how you are running. Are you just starting pyspark local mode or running against a cluster (yarn, standalone, etc)?

If you just pointed to jupyter and ran "pyspark" - its running spark in local mode. Generally you can see the logs output from the terminal you started pyspark from. The default log mode there is just warnings though. You can change in jupyter notebook with:

sc.setLogLevel("INFO")

But either way you should see errors coming out.

If you are running in local mode you should follow the instructions for the rapids plugin here: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#local-mode

Specifically Spark in local mode doesn't support gpu scheduling so you should remove all those configs.

I would recommend specifying the options on the command line when you launch it. I ran a quick test with pyspark in local mode with jupyter by launching it with:

pyspark --master local[4] --jars cudf-0.18-SNAPSHOT-cuda10-1.jar,rapids-4-spark_2.12-0.4.0-SNAPSHOT.jar --conf spark.driver.extraJavaOptions=-Duser.timezone=GMT --conf spark.sql.session.timeZone=UTC --conf spark.executor.extraJavaOptions=-Duser.timezone=GMT --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf --conf spark.rapids.sql.explain="NOT_ON_GPU"

Generally even against yarn and standalone mode deploys I would expect your driver logs to come out where you launched your pyspark from, unless you are running in cluster mode, the executors logs which would run on the cluster would be elsewhere probably.

Also note that this configuration is not valid with the spark-rapids plugin: conf.set("spark.executor.resource.gpu.amount","8") conf.set("spark.task.resource.gpu.amount","8") The plugin only supports 1 gpu per executor.

You also don't need any driver gpus: conf.set("spark.driver.resource.gpu.amount","8") buts it ok if you want.

Feel free to file an issue in the spark-rapids repo if you have further problems.

  • Yes, I am running local mode. About the "getting started" for me the config does not work, the config you wrote also does not work and ends with "Answer from Java side is empty". Regualr Spark works completely fine. Thanks for reply, I will try to file an issues I think. Also, the problem is that I cannot set log level because I cannot create spark context. – Tomasz Feb 03 '21 at 06:33