0

I am trying to fit a TensorFlow model and one my features comes in as a comma-separated string of ints (possibly empty string). The feature appears in the pretransform schema as

feature {
  name: "csstring"
  type: BYTES
  presence {
    min_fraction: 1.0
  }
  shape {
    dim {
      size: 1
    }
  }
}

and in the preprocessing_fn function it is processed via

splitted = tf.squeeze(tf.strings.split(inputs["csstring"], sep=","), axis=1)
filled = tf.where(splitted=='', 'nan', splitted)
casted = tf.strings.to_number(filled)
meaned = tf.reduce_mean(casted, axis=1)
outputs["csstring"] = meaned

I have managed to load the pre-transformed examples in a notebook and apply these transformation steps to get the processed feature as the average of each list (nan if the list is empty).

However when I run the pipeline as a whole on Kubeflow I am getting this error where the transform component fails:

ValueError: An error occured while trying to apply the transformation: "StringToNumberOp could not correctly convert string:
[[node transform/transform/StringToNumber_1 (defined at venv/lib/python3.8/site-packages/tensorflow_transform/saved/saved_transform_io.py:262) ]]

I can't see any particular string instance that would be problematic to cast, and would appreciate any ideas as to why the pipeline doesn't work.

user1337
  • 87
  • 7

0 Answers0