I am working on a text classification problem in python using sklearn. I have created the model and saved it in a pickle.
Below is the code I used in sklearn.
vectorizerPipe = Pipeline([('tfidf', TfidfVectorizer(lowercase=True,
stop_words='english')),
('classification', OneVsRestClassifier(LinearSVC(penalty='l2', loss='hinge'))),])
prd=vectorizerPipe.fit(features_used,labels_used])
f = open(file_path, 'wb')
pickle.dump(prd, f)
Is there any way to use this same pickle to get the output in DataFrame based apache spark and not RDD based. I have gone through the following articles but didn't find a proper way to implement.
what-is-the-recommended-way-to-distribute-a-scikit-learn-classifier-in-spark
how-to-do-prediction-with-sklearn-model-inside-spark -> I found both these questions on StackOverflow and find it useful.
deploy-a-python-model-more-efficiently-over-spark
I am a beginner in Machine learning. So, pardon me If the explanation is naive. Any related example or implementation would be helpful.