How to INCLUDE certain pre-processing step into model for Tensorflow serving

Question

I have built a model with different features. For the preprocessing I have used mainly feature_columns. For instance, for bucketizing GEO information or for embedding categorical data with a large amount of different values. Additionally, I had to preprocess two of my features before using feature_columns:

Feature “STREET”

def __preProcessStreet(data, tokenizer=None):

    data['STREETPRO'] = data['STREET'].apply(lambda x: __getNormalizedString(x, ["gasse", "straße", "strasse", "str.", "g.", " "], False))

    if tokenizer == None:
        tokenizer = Tokenizer(split='XXX')
        tokenizer.fit_on_texts(data['STREETPRO'])

    street_tokenized = tokenizer.texts_to_sequences(data['STREETPRO'])

    data['STREETW'] = tf.keras.preprocessing.sequence.pad_sequences(street_tokenized, maxlen=1)

    return data, tokenizer

As you can see, I did the preprocessing steps directly on the loaded Pandas dataframe. Afterwards I processed this new column with the help of the mentioned columns:

def __getFutureColumnStreet(street_num_words):

    street_voc = tf.feature_column.categorical_column_with_identity(
        key='STREETW', num_buckets=street_num_words)

    dim = __getNumberOfDimensions(street_num_words)

    street_embedding = feature_column.embedding_column(street_voc, dimension=dim)

    return street_embedding

Feature “NAME1

The preprocessing steps for the NAME1 column are quite similar except of the fact that I have split the NAME1 field in two different fields “NAME1W1” and “NAME1W2” which include the two most common words in the vocabulary:

def __preProcessName(data, tokenizer=None):

    data['NAME1PRO'] = data['NAME1'].apply(lambda x: __getNormalizedString(x, ["(asg)", "asg", "(poasg)", "poasg"]))

    if tokenizer == None:
        tokenizer = Tokenizer()
        tokenizer.fit_on_texts(data['NAME1PRO'])

    name1_tokenized = tokenizer.texts_to_sequences(data['NAME1PRO'])

    name1_tokenized_pad = tf.keras.preprocessing.sequence.pad_sequences(name1_tokenized, maxlen=2, truncating='pre')

    data = pd.concat([data, pd.DataFrame(name1_tokenized_pad, columns=['NAME1W1', 'NAME1W2'])], axis=1)

    return data, tokenizer

Afterwards I also used feature_colums for the word embedding:

def __getFutureColumnsName(name_num_words):

    namew1_voc = tf.feature_column.categorical_column_with_identity(
        key='NAME1W1', num_buckets=name_num_words)
    namew2_voc = tf.feature_column.categorical_column_with_identity(
        key='NAME1W2', num_buckets=name_num_words)

    dim = __getNumberOfDimensions(name_num_words)

    namew1_embedding = feature_column.embedding_column(namew1_voc, dimension=dim)
    namew2_embedding = feature_column.embedding_column(namew2_voc, dimension=dim)

    return (namew1_embedding, namew2_embedding)

Model

I am using the Functional API of TensorFlow for constructing my model:

                print("start preprocessing...")
                feature_columns = feature_selection.getFutureColumns(data, args.zip, args.sc, bucketSizeGEO, False)
                feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=True)
                print("preprocessing completed")

…                

                            print("Step {}/{}".format(currentStep, stepNum))

                            feature_layer_inputs = feature_selection.getFeatureLayerInputs()
                            new_layer = feature_layer(feature_layer_inputs)
                            

                            for _ in range(numLayers):
                                new_layer = tf.keras.layers.Dense(numNodes, activation=tf.nn.swish, kernel_regularizer=regularizers.l2(reg), bias_regularizer=regularizers.l2(reg))(new_layer)
                                new_layer = tf.keras.layers.Dropout(dropRate)(new_layer) 

                            output_layer = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid, kernel_regularizer=regularizers.l2(reg), bias_regularizer=regularizers.l2(reg))(new_layer)

                            model = tf.keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=output_layer)

                            model.compile(optimizer=opt,
                                loss='binary_crossentropy',
                                metrics=['accuracy'])

                            paramString = "Arg-e{}-b{}-l{}-n{}-o{}-z{}-r{}-d{}".format(args.epoch, args.batchSize, numLayers, numNodes, opt, bucketSizeGEO, reg, dropRate)

                            log_dir = "logs\\neural\\" + paramString + "\\" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
                            tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

                            print("Start training with the following parameters:", paramString)

                            model.fit(train_ds,
                                    validation_data=val_ds,
                                    epochs=args.epoch,
                                    callbacks=[tensorboard_callback])

TensorFlow Serving

Logically the two preprocessing steps containing the Tokenizer are not part of the model and therefore can’t be processed during the serving so that a POST command for the model server looks like this (on Windows):

curl -d "{"""instances""": [{"""NAME1W1""": [12], """NAME1W2""": [2032], """ZIP""": [""1120""], """STREETW""": [1180], """LONGITUDE""": 16.47, """LATITUDE""": 48.22, """AVIS_TYPE""": [""E""],"""ASG""": [0], """SC""": [""101""], """PREDICT""": [0]}]}" -X POST http://localhost:8501/v1/models/my_model:predict

So at the moment I am trying to find a way to include this two preprocessing steps inside my model so that the POST command would look like this:

curl -d "{"""instances""": [{"""NAME1""": [“”Max Mustermann””], """ZIP""": [""1120""], """STREET""": [Teststraße], """LONGITUDE""": 16.47, """LATITUDE""": 48.22, """AVIS_TYPE""": [""E""],"""ASG""": [0], """SC""": [""101""], """PREDICT""": [0]}]}" -X POST http://localhost:8501/v1/models/my_model:predict

but with the same pre-processing steps inside the model.

I tried to use map functions on the datasets or preprocessing layers but without success because I ‘am not sure if I can use a combination of them with the future_columns. I also tried something similar like mentioned here: https://keras.io/examples/structured_data/structured_data_classification_from_scratch/

Yes but my problem is that I am not sure where to start and if it’s possible to keep the working parts like future_columns etc.. — Ling, Nov 02 '20 at 17:34

score 2 · Answer 1 · answered Nov 03 '20 at 04:14

2

I think TFX Transform component is what you need. It will not be part of your model, but part of your pipeline. That way, you can easily modify the preprocessing transformation that you want in the future without affecting the model.

The main function of that component is preprocessing_fn, this will be the series of transformations you want to apply to the inputs. TensorFlow guide provide much better explanation and tutorial for you to try.

Here's some references:

TFX Transform Component (https://www.tensorflow.org/tfx/guide/transform)
Using it with Keras (https://www.tensorflow.org/tfx/guide/keras#tfx_components)
Using it as stand-alone library (https://www.tensorflow.org/tfx/transform/get_started)

answered Nov 03 '20 at 04:14

Yosi Pramajaya

3,895
2
12
34

Thank you for your answer! Ok, then I will work my way through it. Do you know if I have to reimplement thinks like the mentioned future_column preprocessing step in TFX or is it possible to just Implement the described preprocessing functions which are not part of the model yet and then feed the result to the future_column part? – Ling Nov 03 '20 at 15:10
Answering my own comment: yes it is possible. See function ``_build_keras_model`` in the TFX tutorial notebook https://www.tensorflow.org/tfx/tutorials/tfx/components_keras?hl=en – Ling Feb 02 '21 at 14:42

How to INCLUDE certain pre-processing step into model for Tensorflow serving

Feature “STREET”

Feature “NAME1

Model

TensorFlow Serving

1 Answers1