8

I'm building stand alone python programs that will use pyspark (and elasticsearch-hadoop connector). I am also addicted to the Python Debugger (PDB) and want to be able to step through my code.

It appears I can't run pyspark with the PDB like I normally do

./pyspark -m pdb testCode.py

I get an error "pyspark does not support any application options"

Is it possible to run pyspark code from the standard python interpreter? or do i need to give up the pdb?

I also saw online that I need to include py4j-0.9-src.zip in my PYTHONPATH. When I do that, I can use the python interpreter and step through my code, but I get an error "Py4JavaError: Py4JJava...t id=o18)" when it runs any of the pyspark code. That error seemed to indicate that I wasn't really interacting with spark.

How do I approach this?

cybergoof
  • 1,407
  • 3
  • 16
  • 25
  • You can use PySpark directly from standard Python session but I doubt it will work as you expect. See my answer [here](http://stackoverflow.com/a/33328543/1560062). – zero323 Mar 30 '16 at 16:04

0 Answers0