1

I had trained a model using pandas and keras (using TensorFlow backend) and now need to migrate it to a distributed environment (AWS EMR).

I was trying to use Apache Spark (for reading the input created) and Mxnet as backend for Keras. I was able to load the model architecture and weights but when I call model.predict(input)

AttributeError: 'DataFrame' object has no attribute 'values'

Here, is the code sample,

from keras.models import model_from_json, load_model
from pyspark.sql import SparkSession
import mxnet as mx

spark = SparkSession.builder.appName(app_name).getOrCreate()
data = spark.read.csv(input_file, sep="|", header=True)

print('Loading model...')
with open(model_architecture, 'r') as f:
    model = model_from_json(f.read())
model.load_weights(model_weights)

predictions = model.predict(data)

Is there anything that I am missing or is there any other approach that I can use to get this working.

shreyansh
  • 108
  • 2
  • 11
  • you are using a spark dataframe, which doesnt have the attribute .values ( which basically converts the df to numpy array). So use model.predict(data.toPandas()) – venkata krishnan Dec 03 '19 at 06:11
  • probably you can try using pandas udf, and inside the udf you can predict using the model using pandas dataframe. – Bitswazsky Dec 03 '19 at 09:28
  • @venkatakrishnan Thanks, that worked. Do you know if there is a better alternative to what I am doing? The prediction doesn't seem to happen in a distributed manner. – shreyansh Dec 03 '19 at 11:38
  • Thanks, @Bitswazsky will try that too – shreyansh Dec 03 '19 at 11:40

0 Answers0