Different results prediction with flask and with dl4j

Question

I find some problems importing models training by tensorflow and keras python models. With flask is easy but using deeplearning4j or org.tensorflow I have some problems. it seems java libraries are still undeveloped.

In this case, the tokenization is different using pickle in flask than using json in deeplearning4j.

This is the code in flask and the output for the string "I am disputing with my mortgage confusing misleading term":

@app.route('/')
def home():
return flask.render_template('complaints_classify.html')


@app.route('/predict',methods=['POST'])
def predict():
    print('preditct\n')
    a = request.form['complaint']
    print(a)
    #invoke the model

    import tensorflow as tf
    #loaded_model =tf.keras.models.load_model('/home/ean/anaconda3/envs/my_env/flask_projects/models/modelLstm.h5')
    loaded_model =tf.keras.models.load_model('/home/ean/anaconda3/envs/my_env/modelLstm.h5')
    loaded_model.summary()
    # loading
    with open('/home/ean/anaconda3/envs/my_env/tokenizer.pickle', 'rb') as handle:
        tokenizer = pickle.load(handle)

       print(tokenizer.get_config())    

    # load weights into new ()model

    print("Loaded model from disk")

    labels = ['Credit card','Debt collection','Credit Reporting','Mortgage','Payday loan','Student loan']
    text = [a]
    text = np.array(text)
    print("text en numpy:\n")
    print(text)
    to_exclude = '!"#$%&()*+-./:;<=>?@[\]^_`{|}~\t\n'

    X = tokenizer.texts_to_sequences(text)
    print(X)
    X = pad_sequences(X, maxlen=50)
    print(X)

    pred= loaded_model.predict(X)
    print(pred)
    print(np.argmax(pred))
    print(text, labels[np.argmax(pred)])

    data = {labels[np.argmax(pred)]}
    return flask.render_template('complaints_classify.html',complaint=format(text),prediction_text='{}'.format(data))

And the output is:

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]]

Predicting very well:

['I am disputing with my mortgage confusing misleading term'] Mortgage

And this is the code using DeepLearning4j beta6.

public class GuestbookController {
@RequestMapping(value = "/micro-service")
  public String hello() throws Exception {
      String modeloStr = "";
      try {
      System.out.println("Estamos en:"+InetAddress.getLocalHost().getHostAddress());
      String[] labels =  {"Credit card","Debt collection","Credit Reporting","Mortgage","Payday loan","Student loan"};
      // load the model
     //String simpleMlp = new ClassPathResource("/home/oscar/curso/microservicio-ML/model/modelLstm.h5").getFile().getPath();
     String simpleMlp = new ClassPathResource("modelLstm.h5").getFile().getPath();
     System.out.println("Fichero leido\n");
     MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(simpleMlp,false);
     for(int i= 0;i<model.getLayers().length;i++){
         modeloStr=modeloStr+(model.getLayers()[i]).getConfig()+"\n";
     }    
     System.out.println(modeloStr);

     String[] texts = new String[] {"I am disputing with my mortgage confusing misleading term"};
     String path = "tokenizer.json";
     KerasTokenizer tokenizer = KerasTokenizer.fromJson(Resources.asFile(path).getAbsolutePath());
     //KerasTokenizer tokenizer = new KerasTokenizer(50);
     //tokenizer.fitOnTexts(texts));
     Integer[][] sequences = tokenizer.textsToSequences(texts);


     Integer [][] tmp10=sequences;
     String tmp11 =""; 
     for(int p= 0;p<tmp10.length;p++){
         for (int q=0;q<tmp10[p].length;q++){
         tmp11=tmp11+(tmp10[p][q])+"\n";
         }
     }    
     System.out.println("-------Sequences:"+tmp11);  
     System.out.println(tokenizer.textsToMatrix(texts, TokenizerMode.FREQ));

And the output is:

-------Sequences:1

La salida de: tokenizer.textsToMatrix(texts, TokenizerMode.FREQ)

[[ 0, 1.0000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

I really appreciate any help about how to solve it.

Many Thanks!!!

Your code doesn't actually show how you use the model in dl4j. But in any case, make sure that the input to the model is actually equivalent. Esp. when considering the difference between NWC and NCW (channels last and channels first) between those two frameworks. — Paul Dubs, May 02 '20 at 10:56
Thanks Paul for your answer. You can find the code in the second example. In summary: — Eva Andres, May 02 '20 at 17:27
String simpleMlp = new ClassPathResource("modelLstm.h5").getFile().getPath(); System.out.println("Fichero leido\n"); MultiLayerNetwork model = KerasModelImport.importKerasSequentialModelAndWeights(simpleMlp,false);String[] texts = new String[] {"I am disputing with my mortgage confusing misleading term"}; String path = "tokenizer.json"; KerasTokenizer tokenizer = KerasTokenizer.fromJson(Resources.asFile(path).getAbsolutePath());Integer[][] sequences = tokenizer.textsToSequences(texts);tokenizer.textsToMatrix(texts, TokenizerMode.FREQ) — Eva Andres, May 02 '20 at 17:31
I see you loading the model, and printing it out. But I don't see it (the variable model) being used anywhere after that. — Paul Dubs, May 03 '20 at 18:48
The problem is in tokenizertomatriz or tosequences (output completly different of the same string in flask code): tokenizer:String[] texts = new String[] {"I am disputing with my mortgage confusing misleading term"}; String path = "tokenizer.json"; KerasTokenizer tokenizer = KerasTokenizer.fromJson(Resources.asFile(path).getAbsolutePath()) Integer[][] sequences = tokenizer.textsToSequences(texts); System.out.println(tokenizer.textsToMatrix(texts, TokenizerMode.FREQ)); — Eva Andres, May 03 '20 at 19:40
Output in flask executing :[tokenizer.texts_to_sequences(text):,[[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]] Output in deeplearning4j executing: tokenizer.textsToMatrix(texts): [[ 0, 1.0000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]] — Eva Andres, May 03 '20 at 19:47
If your problem is the tokenizer, why did you add all the unnecessary code around it to the question? Anyway, I can't see any way to reproduce your problem without also getting your tokenizer. It may be an actual bug, so please open an issue at https://github.com/eclipse/deeplearning4j/issues and include both your pickle as well as your json there please. — Paul Dubs, May 04 '20 at 13:44

Different results prediction with flask and with dl4j

0 Answers0