Background: I built an XGBClassifier model for content-based filtering and an ALS model for collaborative filtering (for ALS, I imported from pyspark.ml) and took the weighted sum of rating predictions from both to yield the final rating predictions, which are sorted in descending order (and top 5 rows are shown for user as top 5 recommendations) for a hybrid recommendation system which was built on scraped Yelp data containing Singapore's coffee-drinking outlets - basically, I have built a hybrid recommender to recommend coffee-drinking outlets to coffee lovers in Singapore based on Yelp data.
I have built and run it successfully in a local jupyter notebook as well as in a virtual environment as a Flask app (code from jupyter notebook was copied and pasted into a flaskr.py and together with its accompanying static stylesheets and html templates, constitute the flask app).
In preparing for deployment with Heroku, I have also prepared a requirements.txt based on pip freeze command, a Procfile that contains gunicorn and the various arguments such as --timeout 1800 for instance (as my flask app took 20 mins to churn out the recommendations so I thought of lengthening the worker timeout to 20 mins (1800s)), and even copied and pasted my .bash_profile into the flaskr folder (within this flaskr folder, there is another flaskr folder containing flaskr.py, requirements.txt, Procfile, and the relevant datasets used).
In my flaskr.py, I did not use SparkContext nor spark-submit but only SparkSession and the flask app worked both in a local virtual environment and my local jupyter notebook but when I tried to deploy on Heroku with gunicorn in the Procfile, the FileNotFoundError [ErrNo2] where spark-submit is not found was raised...
I tried running heroku run .bin/pyspark(or spark-shell) -a on Terminal with virtual environment activated and the pyspark command generated the following output:
While for the spark-shell command, only spark-submit was not found but the issue is, both files are very much present in the respective paths when I checked!
The following is the error log encountered when I click "submit" in the deployed app: coffee-recsys.herokuapp.com , where the main problem (I think) is the stuff located inside the red box...
Would really appreciate if anyone can enlighten me on how I can possibly resolve this issue as I have been researching online and permutating my google search terms for the past few days to no avail. Or should I try other search engines like bing or yahoo instead?
Any help rendered is appreciated, even if it does not result in the successful deployment of my app on heroku (eg. due to possible incompatibility issues between spark-2.4.5 and heroku)..