4

When I run PySpark, executing

sc._gateway.help(sc._jsc)

successfully gives me some nice output like

JavaSparkContext extends org.apache.spark.api.java.JavaSparkContextVarargsWorkaround implements java.io.Closeable {
|  
|  Methods defined here:
|  
|  accumulable(Object, String, AccumulableParam) : Accumulable
|  
|  accumulable(Object, AccumulableParam) : Accumulable
|  
|  accumulator(double, String) : Accumulator
|  
|  accumulator(Object, AccumulatorParam) : Accumulator
|  
|  accumulator(Object, String, AccumulatorParam) : Accumulator
|  
|  accumulator(double) : Accumulator
...

while running

sc._gateway.help(sc._jsc.sc())

gives me a Py4J error with a Java NPE

Py4JError: An error occurred while calling None.None. Trace:
java.lang.NullPointerException
at py4j.model.Py4JMember.compareTo(Py4JMember.java:54)
at py4j.model.Py4JMember.compareTo(Py4JMember.java:39)
at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:290)
at java.util.ComparableTimSort.sort(ComparableTimSort.java:157)
at java.util.ComparableTimSort.sort(ComparableTimSort.java:146)
at java.util.Arrays.sort(Arrays.java:472)
at java.util.Collections.sort(Collections.java:155)
at py4j.model.Py4JClass.buildClass(Py4JClass.java:88)
at py4j.commands.HelpPageCommand.getHelpObject(HelpPageCommand.java:118)
at py4j.commands.HelpPageCommand.execute(HelpPageCommand.java:74)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:744)

Through Py4J, why can I not get access to the SparkContext contained in the JavaSparkContext I am given access to?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Uri Laserson
  • 2,391
  • 5
  • 30
  • 39

1 Answers1

6

sc._jsc.sc() is the right way to access the underlying SparkContext. To illustrate:

>>> sc._jsc.sc()
JavaObject id=o27
>>> sc._jsc.sc().version()
u'1.1.0'
>>> sc._jsc.sc().defaultMinSplits()
2

The problem that you're seeing here is that Py4J's help command has trouble displaying the help for this class (possibly a Py4J bug).

Josh Rosen
  • 13,511
  • 6
  • 58
  • 70
  • Hi, this really looks like a bug with the help command and the way the SparkContext class is declared. Do not hesitate to open a bug report https://github.com/bartdag/py4j/issues/new :-) – Barthelemy Jan 23 '15 at 01:17