7

I have a test.py file

import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.externals import joblib
import tqdm
import time

print("Successful import")

I have followed this method to create independent zip of all dependencies

pip install -t dependencies -r requirements.txt
cd dependencies
zip -r ../dependencies.zip .

which creates this tree structure (dependencies.zip)

dependencies.zip
     ->pandas
     ->numpy
     ->........

and when I run

spark-submit --py-files /home/ion/Documents/dependencies.zip /home/ion/Documents/sentiment_analysis/test.py

I get the following error

2018-05-16 07:36:21 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "/home/ion/Documents/sentiment_analysis/test.py", line 2, in <module>
    from encoder import Model
  File "/home/ion/Documents/sentiment_analysis/encoder.py", line 2, in <module>
    import numpy as np
  File "/home/ion/Documents/dependencies.zip/numpy/__init__.py", line 142, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/add_newdocs.py", line 13, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/lib/__init__.py", line 8, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/lib/type_check.py", line 11, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/core/__init__.py", line 26, in <module>
ImportError: 
Importing the multiarray numpy extension module failed.  Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control).  Otherwise reinstall numpy.

Original error was: cannot import name multiarray

2018-05-16 07:36:21 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-05-16 07:36:21 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-a3c2ec75-6c12-4ac2-ae2c-b36412209889

Is there any way So that I can run this python script as spark jon without changing the code in pyspark or changing a minimum of code?

  • Have you checked here: https://stackoverflow.com/questions/47905546/how-to-pass-python-package-to-spark-job-and-invoke-main-file-from-package-with-a ? – Ala Tarighati Feb 13 '20 at 13:30

0 Answers0