I am pretty new to data science and neural network. I have a dataset of unicode sentences which havebeen labeled as 0 and 1 for 'spam' or 'not_spam'. the Model I used for data is the code below(excluding the data preprocessing):
from keras.models import Model
from keras.layers import LSTM, Activation, Dense, Dropout, Input, Embedding
def RNN():
inputs = Input(name='inputs',shape=[max_len])
layer = Embedding(max_words,50,input_length=max_len)(inputs)
layer = LSTM(64)(layer)
layer = Dense(256,name='FC1')(layer)
layer = Activation('relu')(layer)
layer = Dropout(0.5)(layer)
layer = Dense(1,name='out_layer')(layer)
layer = Activation('sigmoid')(layer)
# sigmoid aka 0 to 1 output
model = Model(inputs=inputs,outputs=layer)
return model
model.compile(loss='binary_crossentropy',optimizer=RMSprop(),metrics=['accuracy'])
The predictions are so far so good.
But now I had my data-set modified. I added 6 columns instead of 'spam' column so my data label become an integer between [1-7]. Data-set looks like this (case #1):
sentence | category
sent 1 | 1
sent 2 | 3
sent 3 | 2
sent 4 | 7
.
.
.
I know I can add dummy variables and modify it like this one(case#2):
sentence | category_1 | category_2 | category_3 | ... | category_7
sent 1 | 1 | 0 | 0 | | 0
sent 2 | 0 | 0 | 1 | | 0
sent 3 | 0 | 1 | 0 | | 0
sent 4 | 0 | 0 | 0 | | 1
.
.
.
So I'm familiar with feature engineering part for data-set. What actually I'm looking for is to modify the code to have output from model like 1,2,3,4,.. (which implies prediction for each category).
Does anybody know how can I modify the code (The keras model) with as little editing as possible?
Any other recommendation for enhancing accuracy (based on experiences in NLP and Neural Network fields) would be appreciated.