PySpark and PDB don't seem to mix

Asked Mar 29 '16 at 18:48

Active Mar 29 '16 at 18:48

Viewed 972 times

I'm building stand alone python programs that will use pyspark (and elasticsearch-hadoop connector). I am also addicted to the Python Debugger (PDB) and want to be able to step through my code.

It appears I can't run pyspark with the PDB like I normally do

./pyspark -m pdb testCode.py

I get an error "pyspark does not support any application options"

Is it possible to run pyspark code from the standard python interpreter? or do i need to give up the pdb?

I also saw online that I need to include py4j-0.9-src.zip in my PYTHONPATH. When I do that, I can use the python interpreter and step through my code, but I get an error "Py4JavaError: Py4JJava...t id=o18)" when it runs any of the pyspark code. That error seemed to indicate that I wasn't really interacting with spark.

How do I approach this?

asked Mar 29 '16 at 18:48

cybergoof

1,407
3
16
25

You can use PySpark directly from standard Python session but I doubt it will work as you expect. See my answer [here](http://stackoverflow.com/a/33328543/1560062). – zero323 Mar 30 '16 at 16:04

PySpark and PDB don't seem to mix

0 Answers0