Using an existing keras model (architecture and weights) in a distributed environment using Apache Spark and Mxnet

Question

I had trained a model using pandas and keras (using TensorFlow backend) and now need to migrate it to a distributed environment (AWS EMR).

I was trying to use Apache Spark (for reading the input created) and Mxnet as backend for Keras. I was able to load the model architecture and weights but when I call model.predict(input)

AttributeError: 'DataFrame' object has no attribute 'values'

Here, is the code sample,

from keras.models import model_from_json, load_model
from pyspark.sql import SparkSession
import mxnet as mx

spark = SparkSession.builder.appName(app_name).getOrCreate()
data = spark.read.csv(input_file, sep="|", header=True)

print('Loading model...')
with open(model_architecture, 'r') as f:
    model = model_from_json(f.read())
model.load_weights(model_weights)

predictions = model.predict(data)

Is there anything that I am missing or is there any other approach that I can use to get this working.

you are using a spark dataframe, which doesnt have the attribute .values ( which basically converts the df to numpy array). So use model.predict(data.toPandas()) — venkata krishnan, Dec 03 '19 at 06:11
probably you can try using pandas udf, and inside the udf you can predict using the model using pandas dataframe. — Bitswazsky, Dec 03 '19 at 09:28
@venkatakrishnan Thanks, that worked. Do you know if there is a better alternative to what I am doing? The prediction doesn't seem to happen in a distributed manner. — shreyansh, Dec 03 '19 at 11:38

Using an existing keras model (architecture and weights) in a distributed environment using Apache Spark and Mxnet

0 Answers0