I'm trying to implement a simple Apache Spark RDD system but it seems I'm not able to access that session.
I started by doing:
./start-all.sh
on /usr/local/spark/sbin
then I created a new session by doing this:
spark = (SparkSession.builder
.appName("Oncofinder -- Preprocessing")
.getOrCreate())
dirname = "oncofinder"
zipname = dirname + ".zip"
shutil.make_archive(dirname, 'zip', dirname + "/..", dirname)
spark.sparkContext.addPyFile(zipname)
and shipping a fresh copy of my app package to the Spark workers.
I'm using the Python library pyspark.
Then, I'm using my spark session on a function called preprocess:
train_rdd = preprocess(spark, [1, 2], tile_size=tile_size, sample_size=sample_size,
grayscale=grayscale, num_partitions=num_partitions, folder=folder)
and my function:
def preprocess(spark, slide_nums, folder="data", training=True, tile_size=1024, overlap=0,
tissue_threshold=0.9, sample_size=256, grayscale=False, normalize_stains=True,
num_partitions=20000):
print("===PREPROCESSING===")
slides = (spark.sparkContext
.parallelize(slide_nums)
.filter(lambda slide: open_slide(slide, folder, training) is not None))
and when I run this piece of code, I get:
2018-11-27 00:36:30 WARN Utils:66 - Your hostname, luiscosta-GT62VR-6RD resolves to a loopback address: 127.0.1.1; using 192.168.1.67 instead (on interface wlp2s0)
2018-11-27 00:36:30 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/luiscosta/PycharmProjects/wsi_preprocessing/oncofinder/lib/python3.6/site-packages/pyspark/jars/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2018-11-27 00:36:30 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
===PREPROCESSING===
It reaches my ===PREPROCESSING===
checkpoint but it does not run my open_slide
function.
I'm kind of new to Apache Spark and I apologize if this is a silly question but when I read the docs it looked really straightforward.
Kind Regardsspar