0

There is a way to do like pandas inside tensor dataset, deleating row with a nan like this???

ds = ds[~np.isnan(ds).any(axis=1)]

My test exemple is:

simple_data_samples = np.array([
    [1, 11, 111, -1, -11],
    [2, np.nan, 222, -2, -22],
    [3, 33, 333, -3, -33],
    [4, 44, 444, -4, -44],
    [5, 55, 555, -5, -55],
    [6, 66, 666, -6, -66],
    [7, 77, 777, -7, -77],
    [8, 88, 888, -8, -88],
    [9, 99, 999, -9, np.nan],
    [10, 100, 1000, -10, -100],
    [11, 111, 1111, -11, -111],
    [12, 122, 122, -12, -122]
])

ds = tf.data.Dataset.from_tensor_slices(simple_data_samples)

ds = dataset.window(4, shift=1, drop_remainder=True)
ds = ds.flat_map(lambda x: x).batch(4)
ds = ds.shuffle(dim_dataset)

# clear nan row here

This must be done after shuffle.

###############EDIT UPDATE##############

next step is to split label with this short function:

def split_feature_label(x):
    return x[:input_sequence_length], x[input_sequence_length:, 
            slice(slice_size, None, None)]_test

and final transform like this...

ds = ds.map(split_feature_label)

# split data train test set.
    
split = round(split_train_ratio * (dim_dataset - input_sequence_length - forecast_sequence_length))
ds_train = ds.take(split)
ds_valid = ds.skip(split)

ds_train = ds_train.batch(batch_size, drop_remainder=True)
ds_valid = ds.batch(batch_size, drop_remainder=True)

ds_train = ds_train.prefetch(1)
ds_valid = ds.prefetch(1)


return iter(ds_train), iter(ds_valid)

If I introduce this proposed solution:

ds = ds.map(lambda x: tf.boolean_mask(x, tf.reduce_all(~tf.math.is_nan(x), axis=-1)))

It seem to work until I call the next step of splitting my input and label (last column == label). The code run but if I try to inspect my data after this splitting I get these messages:

2022-12-23 10:15:05.514989: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at strided_slice_op.cc:111 : INVALID_ARGUMENT: slice index 3 of dimension 0 out of bounds.
and
raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} slice index 3 of dimension 0 out of bounds.
     [[{{node strided_slice_1}}]] [Op:IteratorGetNext]


````

Something seem to change in the shape or structure!?!?
Jonathan Roy
  • 405
  • 1
  • 6
  • 18

1 Answers1

1

You can create a map function to filter out the rows containing nan using tf.is_nan(),

ds = ds.map(lambda x: tf.boolean_mask(x, tf.reduce_all(~tf.math.is_nan(x), axis=-1)))

The boolean mask skips rows which contains nan as any element.

Testing the complete code,

simple_data_samples = np.array([
    [1, 11, 111, -1, -11],
    [2, np.nan, 222, -2, -22],
    [3, 33, 333, -3, -33],
    [4, 44, 444, -4, -44],
    [5, 55, 555, -5, -55],
    [6, 66, 666, -6, -66],
    [7, 77, 777, -7, -77],
    [8, 88, 888, -8, -88],
    [9, 99, 999, -9, np.nan],
    [10, 100, 1000, -10, -100],
    [11, 111, 1111, -11, -111],
    [12, 122, 122, -12, -122]
])

ds = tf.data.Dataset.from_tensor_slices(simple_data_samples)

ds = ds.window(4, shift=1, drop_remainder=True)
ds = ds.flat_map(lambda x: x).batch(4)
ds = ds.shuffle(10)
ds = ds.map(lambda x: tf.boolean_mask(x, tf.reduce_all(~tf.math.is_nan(x), axis=-1)))

for data in ds.take(1):#just printing the first sample
    print(data)
#output
 tf.Tensor(
[[  1.  11. 111.  -1. -11.]
 [  3.  33. 333.  -3. -33.]
 [  4.  44. 444.  -4. -44.]]
Vijay Mariappan
  • 16,921
  • 3
  • 40
  • 59
  • I got this warning, I'm not sure if the result is compromised: WARNING:tensorflow:AutoGraph could not transform at 0x000001519A9ED9D0> and will run it as-is. Cause: could not parse the source code of at 0x000001519A9ED9D0>: no matching AST found among candidates: – Jonathan Roy Dec 21 '22 at 22:58
  • I usaly use this short function to valid my result: ```` def print_dataset(ds): for inputs, targets in ds: print("---Batch---") print("Feature:", inputs.numpy()) print("Label:", targets.numpy()) print("") ````` but get error with this solution – Jonathan Roy Dec 21 '22 at 23:14
  • you have even labels?. Update your question with the actual inputs. – Vijay Mariappan Dec 22 '22 at 04:02
  • I did some update... your code seem to work but make bug the other part of mine... probably juste a change in structure or shape that I don't see/understand but we are realy neer of the solution... thank a lot for your patience, first contact with tf.data and the learning curve is not so easy at start – Jonathan Roy Dec 23 '22 at 15:20