-2

Problem statement

I have 3 classes (A, B, and C). I have 6 features:

train_x = [[ 6.442  6.338  7.027  8.789 10.009 12.566]
           [ 6.338  7.027  5.338 10.009  8.122 11.217]
           [ 7.027  5.338  5.335  8.122  5.537  6.408]
           [ 5.338  5.335  5.659  5.537  5.241  7.043]]

These features represent a 5-character string pattern comprising of 3-classes(e.g. AABBC, etc.). Let, a 5-character string pattern is one-hot encoded as follows:

train_z = [[0. 0. 1. 0. 0. 1. 0. 0. 1. 0. 0. 1. 1. 0. 0.]    
           [0. 0. 1. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 0.]
           [0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 0.]    
           [0. 0. 1. 1. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 1.]]

My implementation

I have implemented the above problem using a sequential model as follows:

import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

import sys
import time
import random
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
import numpy as np

# <editor-fold desc="handle GPU">
# resolve GPU related issues.
try:
    physical_devices = tf.config.list_physical_devices("GPU")
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
except Exception as e:
    print("GPU not found!")
# END of try
# </editor-fold>

# Directories and files
CLASS_INDEX = 4
FEATURE_START_INDEX = 6
OUTPUT_PATH = r"./"
INPUT_PATH = r"./"
INPUT_DATA_FILE = "dist-5.dat"
TRAINING_PROGRESS_FILE = "training.txt"
MODEL_FILE = "model.h5"

# classification size
CLASSES_COUNT = 3
FEATURES_COUNT = 6
OUTPUTS_COUNT = 15

# Network parameters.
LAYER_1_NEURON_COUNT = 128
LAYER_2_NEURON_COUNT = 128

# Training parameters.
LEARNING_RATE = 0.01
EPOCHS = 1000  # 500
BATCH_SIZE = 10
NO_OF_INPUT_LINES = 10000
VALIDATION_PART = 0.5
MODEL_SAVE_FREQUENCY = 10

# <editor-fold desc="encoding()">
# <editor-fold desc="def encode(letter)">
def encode(letter: str):
    if letter == 'A':
        return [1.0, 0.0, 0.0]
    elif letter == 'B':
        return [0.0, 1.0, 0.0]
    elif letter == 'C':
        return [0.0, 0.0, 1.0]
# </editor-fold>

# <editor-fold desc="encode_string()">
def encode_string_1(pattern_str: str):
    # Iterate over the string
    one_hot_binary_str = []
    for ch in pattern_str:
        one_hot_binary_str = one_hot_binary_str + encode(ch)
    # END of for loop
    return one_hot_binary_str
# END of function

def encode_string_2(pattern_str: str):
    # Iterate over the string
    one_hot_binary_str = []
    for ch in pattern_str:
        temp_encoded_vect = [encode(ch)]
        one_hot_binary_str = one_hot_binary_str + temp_encoded_vect
    # END of for loop
    return one_hot_binary_str
# END of function
# </editor-fold>

# <editor-fold desc="def load_data()">
def load_data_k(fname: str, class_index: int, feature_start_index: int, **selection):
    i = 0
    file = open(fname)
    if "top_n_lines" in selection:
        lines = [next(file) for _ in range(int(selection["top_n_lines"]))]
    elif "random_n_lines" in selection:
        tmp_lines = file.readlines()
        lines = random.sample(tmp_lines, int(selection["random_n_lines"]))
    else:
        lines = file.readlines()

    data_x, data_y, data_z = [], [], []
    for l in lines:
        row = l.strip().split()  # return a list of words from the line.
        x = [float(ix) for ix in row[feature_start_index:]]  # convert 3rd to 20th word into a vector of float numbers.
        y = encode(row[class_index])  # convert the 3rd word into binary.
        z = encode_string_1(row[class_index+1])
        data_x.append(x)  # append the vector into 'data_x'
        data_y.append(y)  # append the vector into 'data_y'
        data_z.append(z)  # append the vector into 'data_z'
    # END for l in lines

    num_rows = len(data_x)
    given_fraction = selection.get("validation_part", 1.0)
    if given_fraction > 0.9999:
        valid_x, valid_y, valid_z = data_x, data_y, data_z
    else:
        n = int(num_rows * given_fraction)
        valid_x, valid_y, valid_z = data_x[n:], data_y[n:], data_z[n:]
        data_x, data_y, data_z = data_x[:n], data_y[:n], data_z[:n]
    # END of if-else block

    tx = tf.convert_to_tensor(data_x, np.float32)
    ty = tf.convert_to_tensor(data_y, np.float32)
    tz = tf.convert_to_tensor(data_z, np.float32)
    vx = tf.convert_to_tensor(valid_x, np.float32)
    vy = tf.convert_to_tensor(valid_y, np.float32)
    vz = tf.convert_to_tensor(valid_z, np.float32)

    return tx, ty, tz, vx, vy, vz
# END of the function
# </editor-fold>
# </editor-fold>

# <editor-fold desc="def create_model()">
def create_model(n_hidden_1, n_hidden_2, num_outputs, num_features):
    # a simple sequential model
    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(num_features,)))
    model.add(tf.keras.layers.Dense(n_hidden_1, activation="relu"))
    model.add(tf.keras.layers.Dense(n_hidden_2, activation="relu"))
    model.add(tf.keras.layers.Dense(num_outputs))
    return model
# </editor-fold>

# custom loss to take into the dependency between the 3 bits
def loss(y_true, y_pred):
    l1 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, :3], y_pred[:, :3])
    l2 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 3:6], y_pred[:, 3:6])
    l3 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 6:9], y_pred[:, 6:9])
    l4 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 9:12], y_pred[:, 9:12])
    l5 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 12:], y_pred[:, 12:])
    return l1 + l2 + l3 + l4 + l5


if __name__ == "__main__":
    len_int = len(sys.argv)
    arg_str = None

    if len_int > 1:
        arg_str = sys.argv[1]
    else:
        arg_str = os.path.join(INPUT_PATH, INPUT_DATA_FILE)
    # END of if len_int > 1:

    # load training data from the disk
    train_x, train_y, train_z, validate_x,validate_y, validate_z = load_data_k(
        os.path.join(INPUT_PATH, INPUT_DATA_FILE),
        class_index=CLASS_INDEX,
        feature_start_index=FEATURE_START_INDEX,
        top_n_lines=NO_OF_INPUT_LINES,
        validation_part=VALIDATION_PART
    )

    #print(train_y)
    print("z = " + str(train_z))

    # create Stochastic Gradient Descent optimizer for the NN model
    opt_function = keras.optimizers.Adam(
        learning_rate=LEARNING_RATE
    )
    # create a sequential NN model
    model = create_model(
        LAYER_1_NEURON_COUNT,
        LAYER_2_NEURON_COUNT,
        OUTPUTS_COUNT,
        FEATURES_COUNT
    )
    #
    model.compile(optimizer=opt_function, loss=loss, metrics=['accuracy'])
    model.fit(train_x, train_z, epochs=EPOCHS,batch_size=BATCH_SIZE)

The problem

The problem with this source code is, the model is not converging i.e The accuracy is not increasing with increasing epochs.

The question

How can I implement this model?

Innat
  • 16,113
  • 6
  • 53
  • 101
user366312
  • 16,949
  • 65
  • 235
  • 452
  • 2
    Why are you trying to do this conversion? Sequential models only support a single output so they cannot be used for multiple tasks. – Dr. Snoopy Jul 25 '21 at 21:41
  • 1
    That would not be a multi-task problem. – Dr. Snoopy Jul 25 '21 at 23:42
  • @user366312 in your code, the model create_model has num_classes number output, right? It means that here it has one output layer with num_classes classes. If that so, then why are you setting `loss=['categorical_crossentropy'] * 5` (also in metrics)? If I'm not wrong, you should use one loss function and one metrics (removing 5). – Innat Jul 26 '21 at 07:03
  • Is there any specific reason you want to use a Sequential? I can update my answer with more details if I know why you do not want to use a Functional model – Swaroop Bhandary Jul 28 '21 at 11:35
  • @user366312 "GPU not found" is the final error message, the traceback throws Value Error in user code is C:\ProgramData\Miniconda3\envs\by_nn\lib\site-packages\tensorflow\python\framework\tensor_shape.py:1134 assert_is_compatible_with ValueError: Shapes (10, 3) and (10, 15) are incompatible Kindly update the question with the current error you are facing – Archana David Jul 29 '21 at 18:38
  • Could you try to provide a minimal reproducing example ? We should be able to copy-paste the code in a colab and run it to see the error, this will allow us to help you very efficiently (and might even help you understand your error). Right now we don't have access to the data (you could replace that with mock data), some functions or the imports. Also specifying the versions would help. Ideally, you would just implement all that in a colab yourself and share it, that would be amazing. – Zaccharie Ramzi Aug 06 '21 at 13:07
  • @user366312 do you also have mock data or can you share the data file you use? – Zaccharie Ramzi Aug 07 '21 at 08:01
  • @user366312 If this is the original problem, you can change the approach and look at the problem as a classification problem 3^5 different labels. As the size of the label is not going to be too big, trying this approach will solve the problem of combining multiple losses into one and slicing of the tensors etc.. If I were you, I should first test this approach to see if the model/data is good enough to learn from or has the capacity – Cenk Bircanoglu Aug 12 '21 at 07:58
  • I mean if there is no preprocessing step done for your labels or features. But in any way, as I said, I prefer to follow an easy approach as it is a classification problem with 243 (3^5) different labels. Because adding a slicing operation is really hard to optimize. – Cenk Bircanoglu Aug 12 '21 at 10:53

4 Answers4

1

The problem is with how keras calculate the accuracy. For example, in the code below

y_true = np.array([[1,0,0,0,1,0,0,0,1]]) 
y_pred = np.array([[.8,.1,.1,1,10,2,2,3,5.5]]) 

metric = tf.keras.metrics.Accuracy()
metric.update_state(y_true,y_pred)
metric.result().numpy()

The calculated accuracy is zero, however, by comparing

  1. [.8,.1,.1] with [1,0,0]
  2. [1,10,2] with [0,1,0]
  3. [2,3,5.5] with [0,0,1]

we know the y_pred is actually very accurate, and this might be the reason why your model just does not work. In order to handle this problem under the current model, applying sigmoid activation in the output layer might help, you can check this by running the following code

import numpy as np
import tensorflow as tf 
import keras
from sklearn.preprocessing import MinMaxScaler


def dataset_gen(num_samples):
    # each data row consists of six floats, which is the feature vector of a 5-character 
    # string pattern comprising of 3-classes(e.g. AABBC, etc.)
    # in order to represent this 5-character string, a sequentially ordered one-hot encoding vector is used 
    np.random.seed(0)
    output_classes = np.random.randint(0,3,size=(num_samples,5))
    transform_mat = np.arange(-15,15).reshape(5,6) + .1*np.random.rand(5,6)
    print(transform_mat)
    feature_vec = output_classes @ transform_mat
    output_classes += np.array([0,3,6,9,12])
    # convert output_classes to one-hot encoding 
    output_vec = np.zeros((num_samples,15))
    for ind,item in enumerate(output_classes):
        output_vec[ind][item] = 1.
    
    return feature_vec,output_vec


def create_model():
    # a simple sequential model
    n_hidden,num_features,num_outputs = 16,6,15
    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(num_features,)))
    model.add(tf.keras.layers.Dense(n_hidden,activation="relu"))
    model.add(tf.keras.layers.Dense(num_outputs,activation="sigmoid"))
    return model

def loss(y_true, y_pred):
    l1 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, :3], y_pred[:, :3])
    l2 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 3:6], y_pred[:, 3:6])
    l3 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 6:9], y_pred[:, 6:9])
    l4 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 9:12], y_pred[:, 9:12])
    l5 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 12:], y_pred[:, 12:])
    
    return l1 + l2 + l3 + l4 + l5

# create Stochastic Gradient Descent optimizer for the NN model
# opt_function = keras.optimizers.Adam(learning_rate=.1)
# create a sequential NN model
model = create_model()
model.compile(optimizer='adam', loss=loss, metrics=['accuracy'])

es = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy',mode='max',verbose=1,patience=100)
history = model.fit(test_x,test_z,epochs=2000,batch_size=8,
                    callbacks=es,validation_split=0.2,
                    verbose=0)
meTchaikovsky
  • 7,478
  • 2
  • 15
  • 34
  • the use of `sklearn` is not allowed in my project. i must confine myself to `keras` and `tensorflow` only. – user366312 Sep 05 '21 at 10:11
  • @user366312 `sklearn` is only used for normalizing the feature vectors in my code, it can be easily replaced by `numpy`. – meTchaikovsky Sep 05 '21 at 10:12
  • ok. i will test the code on my actual 1.4 million data set, and let you know. – user366312 Sep 05 '21 at 10:13
  • @user366312 Ok, but make sure you understand the first part of my post, the problem might lie within how `keras` calculates the accuracy given your way of one-hot encoding. – meTchaikovsky Sep 05 '21 at 10:15
0

Sequential is used when you have a single network input and output. In the current setup you have multiple output layers to take into consideration consecutive groups of 3 output values are linked. This can be enforced through the loss function as well.

import numpy as np
import tensorflow as tf

# random input data with 6 features
inp = tf.random.uniform(shape=(1000, 6))

# output data taking into consideration that 3 consecutive bits are one class.
out1 = tf.one_hot(tf.random.uniform(shape=(1000,), dtype=tf.int32, maxval=3), depth=3)
out2 = tf.one_hot(tf.random.uniform(shape=(1000,), dtype=tf.int32, maxval=3), depth=3)
out3 = tf.one_hot(tf.random.uniform(shape=(1000,), dtype=tf.int32, maxval=3), depth=3)
out4 = tf.one_hot(tf.random.uniform(shape=(1000,), dtype=tf.int32, maxval=3), depth=3)
out5 = tf.one_hot(tf.random.uniform(shape=(1000,), dtype=tf.int32, maxval=3), depth=3)

out = tf.concat([out1, out2, out3, out4, out5], axis=1)

# a simple sequential model 
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(6,)))
model.add(tf.keras.layers.Dense(20, activation="relu"))
model.add(tf.keras.layers.Dense(20, activation="relu"))
model.add(tf.keras.layers.Dense(15))


# custom loss to take into the dependency between the 3 bits

def loss(y_true, y_pred):
    l1 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, :3], y_pred[:, :3])
    l2 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 3:6], y_pred[:, 3:6])
    l3 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 6:9], y_pred[:, 6:9])
    l4 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 9:12], y_pred[:, 9:12])
    l5 = tf.nn.softmax_cross_entropy_with_logits(y_true[:, 12:], y_pred[:, 12:])
    
    return l1 + l2 + l3 + l4 + l5

opt_function = tf.keras.optimizers.SGD()

model.compile(optimizer=opt_function, loss=loss)
model.fit(inp, out, batch_size=10)

The same idea needs to be used when evaluating the network as well. You need to take argmax over 3 bits separately (5 times) so that you get a sequence of 5 classes as output.

  • What if I input a 15 x 1-bit hot-encoded data, and want to output 15-bit data? – user366312 Jul 28 '21 at 12:01
  • @user366312 in the above code, the input layer shape would be (15,) and the network would work for 15 feature input – Swaroop Bhandary Jul 28 '21 at 14:26
  • You haven't considered the original problem statement. – user366312 Jul 28 '21 at 14:26
  • kindly, check the source code in Repl.it. – user366312 Jul 28 '21 at 14:26
  • You mean the error message? The error message is because of the output shape. The network predicts output of shape (batch_size, 3) and you have compiled the model as follows. ``` model.compile(loss=['categorical_crossentropy'] * 5, optimizer=opt_function, metrics=[['accuracy']] * 5) ``` Since loss is 'categorical_crossentropy'] * 5, tensorflow expects the network to have output of shape (batch_size, 15) and not (batch_size, 3). If you update the model and the loss to be the way I mentioned in the answer it will work fine. – Swaroop Bhandary Jul 28 '21 at 14:29
  • Here, the key of the problem is there are 3 classes (A, B, C), but we can't apply those classes directly as the features don't directly depend on classes, rather they depend on a string of 5 characters (e.g. AABBC, etc). – user366312 Jul 28 '21 at 14:33
  • In the answer given here https://stackoverflow.com/questions/68423157/how-can-implement-this-deep-learning-model-in-keras/68509541#68509541 that constraint has been taken into consideration by having 5 output layers and applying cross entropy separately to each of the layers. I have taken this into consideration in the loss function by applying the cross entropy to 3 bits 5 times. So it effectively enforces the same constraint as having a network with 5 output layers and applying crossentropy to each of them. – Swaroop Bhandary Jul 28 '21 at 14:42
0

I think this is where the problem arises.

 model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
...
loss=['categorical_crossentropy'] * 5

>>> Shapes (10, 3) and (10, 15) are incompatible

You don't really want to mess with your loss function like that. Try to fix your output. Models created with Sequential API are the simpler ones that have a single/output. If you want to change a Functional API model in a simpler layout you should merge the inputs/outputs in a single input/output. Which means that you should merge the labels also after one-hot encoding.

WARNING:tensorflow:AutoGraph could not transform <function loss at 0x000001F571B4F820> and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: module 'gast' has no attribute 'Index' To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

This warning won't make your model not to train, so you can ignore it. If it doesn't train, then you should probably start tweaking hyperparameters!

Georgios Livanos
  • 506
  • 3
  • 17
0

Before I mention my solution I will warn you that it's not correct as the methodology is wrong but it might work if you have a very large dataset. What you want to do is to use consider a set of 3 values as a multi-class problem and the characters as a multi-label problem which is not possible. You can't divide your problem like this for sequential models But if you have a large dataset then you can consider it as a multi-label problem as a whole in which case there will be cases when you get 2 active labels any of the 3 sets and you have to apply post-processing in some manner. Say - set that label active which has the highest sigmoid value individually.

Abhishek Prajapat
  • 1,793
  • 2
  • 8
  • 19