I'm building stand alone python programs that will use pyspark (and elasticsearch-hadoop connector). I am also addicted to the Python Debugger (PDB) and want to be able to step through my code.
It appears I can't run pyspark with the PDB like I normally do
./pyspark -m pdb testCode.py
I get an error "pyspark does not support any application options"
Is it possible to run pyspark code from the standard python interpreter? or do i need to give up the pdb?
I also saw online that I need to include py4j-0.9-src.zip in my PYTHONPATH. When I do that, I can use the python interpreter and step through my code, but I get an error "Py4JavaError: Py4JJava...t id=o18)" when it runs any of the pyspark code. That error seemed to indicate that I wasn't really interacting with spark.
How do I approach this?