ImportError when running NuPIC model on PySpark

Question

I am trying to run NuPIC on PySpark but I am getting an ImportError. Does anyone have any ideas for how I can fix it?

The code runs fine when I don't use PySpark, but I am trying to run it from a Spark Dataset now.

I am trying to run it using the source code I have in my directory, since running it by installing the Nupic package causes some other errors.

Thank you for your help!!

I am trying to run this function

input_data.rdd.foreach(lambda row: iterateRDD(row, model))
def iterateRDD(record, model):
    modelInput = record.asDict(False)
    modelInput["value"] = float(modelInput["value"])
    modelInput["timestamp"] = datetime.datetime.strptime(modelInput["timestamp"], "%Y-%m-%d %H:%M:%S")
    print"modelInput", modelInput
    result = model.run(modelInput)
    anomalyScore = result.inferences['anomalyScore']
    print "Anomaly score is", anomalyScore

However, I get this error and don't understand it.

File "C:/Users/rakshit.trn/Documents/Nupic/nupic-master/examples/anomaly.py", line 100, in runAnomaly input_data.rdd.foreach(lambda row: iterateRDD(row, model)) File "C:\Python\Python27\lib\site-packages\pyspark\rdd.py", line 789, in foreach self.mapPartitions(processPartition).count() # Force evaluation File "C:\Python\Python27\lib\site-packages\pyspark\rdd.py", line 1055, in count return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "C:\Python\Python27\lib\site-packages\pyspark\rdd.py", line 1046, in sum return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add) File "C:\Python\Python27\lib\site-packages\pyspark\rdd.py", line 917, in fold vals = self.mapPartitions(func).collect() File "C:\Python\Python27\lib\site-packages\pyspark\rdd.py", line 816, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "C:\Python\Python27\lib\site-packages\py4j\java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "C:\Python\Python27\lib\site-packages\pyspark\sql\utils.py", line 63, in deco return f(*a, **kw) File "C:\Python\Python27\lib\site-packages\py4j\protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 364, in main File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 69, in read_command File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py", line 172, in _read_with_length return self.loads(obj) File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py", line 583, in loads return pickle.loads(obj) ImportError: No module named frameworks.opf.htm_prediction_model

I guess that NuPIC is not able to get access to the frameworks/opf/htm_prediction_model.py file

score 0 · Answer 1 · answered Jul 17 '19 at 14:28

0

You might be running an old version of NuPIC. See https://discourse.numenta.org/t/warning-0-7-0-breaking-changes/2200 and check what version you are using (https://discourse.numenta.org/t/how-to-check-what-version-of-nupic-is-installed/1045)

answered Jul 17 '19 at 14:28

Matthew Taylor

3,911
4
29
33

ImportError when running NuPIC model on PySpark

1 Answers1