3

I am trying to use Sparkmeausre to check the performance of my Pyspark code. I am using Pycharm Community edition on windows 10, with Pyspark properly configured. I did "pip install sparkmeasure" and sparkmeasure was sucessfully installed. Now when I am trying to run this snippet of code.

from pyspark import  SparkConf , SparkContext
from pyspark.context import  SparkContext
from pyspark.sql.session import  SparkSession
from sparkmeasure import StageMetrics


sc = SparkContext(master = "local" , appName = "sparkdemo")
spark = SparkSession(sc)
sm = StageMetrics(spark)

I am getting the error.

File "C:/Users/nj123/PycharmProjects/pythonProject/sparkdemo.py", line 9, in <module>
sm = StageMetrics(spark)
File "C:\Users\nj123\PycharmProjects\pythonProject\venv\lib\site- 
packages\sparkmeasure\stagemetrics.py", line 15, in __init__
self.stagemetrics = self.sc._jvm.ch.cern.sparkmeasure.StageMetrics(self.sparksession._jsparkSession)
TypeError: 'JavaPackage' object is not callable

How to resolve this error and to configure sparkmeasure to Pycharm correctly?

supernova
  • 93
  • 9
  • 2
    I have faced a similar issue in the past. Put the spark_measure jar in your `spark-2.4.4-bin-hadoop2.7/jars` directory (use the spark version which you have downloaded) and invoke the python script through spark-submit on commandline. That's the only way I could get it work. Don't use Pycharm way to start a spark job. – user238607 Dec 19 '20 at 15:07
  • @user238607 Thanks. I put my spark jar folder in "C:\Spark\spark-3.0.1-bin-hadoop2.7\jars" which contains all jars file. Now, what command should I type on cmd to execute this jar and how to invoke python script ? – supernova Dec 20 '20 at 16:53
  • @user238607 I did as you told, moved my jar file to the desired location and re ran the code in Pycharm terminal. It worked perfectly without any issues. Thanks a lot. Just curious to know, how you resolved this issue when you faced it first time! – supernova Dec 20 '20 at 17:10
  • You don't have to invoke the functions in the jar directly. Your python code will do that. Here's how you can submit pyspark job through spark-submit. https://stackoverflow.com/questions/38120011/using-spark-submit-with-python-main – user238607 Dec 20 '20 at 17:36
  • 1
    Okay. But I did it without spark - submit. Just ran the program in pycharm and it worked properly. – supernova Dec 20 '20 at 18:13
  • Please make sure to write your solution in the answer so that others will have a solution for future reference. – user238607 Dec 20 '20 at 18:17
  • @user238607 Thanks. I posted the solution. – supernova Dec 22 '20 at 10:20

1 Answers1

1

Thanks to @user238607. Here are the steps I performed to resolve this issue.

1. First download Sparkmeasure jar file from Maven Central.

2. Then move this jar file to the spark jar folder. Mine location was, C:\Spark\spark-3.0.1-bin-hadoop2.7\jars

3. Now, Go to pycharm again, and rerun the same code.

Link to download the jar file.

supernova
  • 93
  • 9