7

In the current setup I use a Jupyter notebook server that has a pyspark profile to use Spark. This all works great. I'm however working on a pretty big project and the notebook environment is lacking a bit for me. I found out that PyCharm allows you to run notebooks inside the IDE, giving you more of the advantages of a full IDE as opposed to Jupyter.

In the best case scenario I would run PyCharm locally as opposed to remote desktop on the gateway but using the gateway would be an acceptable alternative.

I'm trying first to get it to work on the gateway. If I have my (spark) Jupyter server running, the IP address set correctly 127.0.0.1:8888 and I create an .ipynb file, after I enter a line and press enter (not running it, just add a newline) I get the following error in the terminal I started pycharm from:

ERROR - pplication.impl.LaterInvocator - Not a stub type: Py:IPNB_TARGET in class org.jetbrains.plugins.ipnb.psi.IpnbPyTargetExpression

Googling doesn't get me anywhere.

Jan van der Vegt
  • 1,471
  • 12
  • 34

1 Answers1

1

I was able to get all three working by installing spark via terminal on OS X. Then I added the following packages to PyCharm project interpreter: findspark, pyspark.

Tested it out with

import findspark
findspark.init()
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()

outputting: 3.14160028