I have a problem with Java failing sometimes when running PySpark in Jupyter Notebook on Ubuntu. What I want is to see the error from the Java side because all I can see is usually very long general error of Python that can be summarized with this:
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/py4j/java_gateway.py", line 1207, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
This error can mean a lot of things and it does not help at all. Usually it means that Java crashed but I want to know why exactly.
The example why I need these logs is that for example I tried to run Rapids on PySpark on DGX-1 machine and it ends in Java crashes like abo when initializing Spark Context. This is not the only reason for these errors but this code easily causes these errors on my side.
import pyspark
import os
cudf = "cudf-0.17-cuda10-1.jar"
rapids = "rapids-4-spark_2.12-0.2.0.jar"
script = "getGpuResources.sh"
separator = ","
conf = pyspark.SparkConf()
conf.set("spark.jars",cudf + "," + rapids)
conf.set("spark.plugins","com.nvidia.spark.SQLPlugin")
conf.set("spark.driver.memory","48g")
conf.set("spark.executor.memory","48g")
conf.set("spark.driver.cores","80")
conf.set("spark.executor.cores","80")
conf.set("spark.task.cpus","80")
conf.set("spark.dynamicAllocation.enabled","false")
conf.set("spark.rapids.sql.concurrentGpuTasks","8")
conf.set("spark.dynamicAllocation.enabled","false")
conf.set("spark.sql.extensions","ai.rapids.spark.Plugin")
conf.set("spark.driver.resource.gpu.amount","8")
conf.set("spark.driver.resource.gpu.discoveryScript",script)
conf.set("spark.executor.resource.gpu.amount","8")
conf.set("spark.executor.resource.gpu.discoveryScript",script)
conf.set("spark.task.resource.gpu.amount","8")
sc = pyspark.SparkContext(appName="rapids", conf = conf)
My question: Is there a way to catch somehow stdout of the Java process that is running by PySpark(using pyspark/jupyter/Ubuntu) to know the real reason of Java crash?