There is a way to do like pandas inside tensor dataset, deleating row with a nan like this???
ds = ds[~np.isnan(ds).any(axis=1)]
My test exemple is:
simple_data_samples = np.array([
[1, 11, 111, -1, -11],
[2, np.nan, 222, -2, -22],
[3, 33, 333, -3, -33],
[4, 44, 444, -4, -44],
[5, 55, 555, -5, -55],
[6, 66, 666, -6, -66],
[7, 77, 777, -7, -77],
[8, 88, 888, -8, -88],
[9, 99, 999, -9, np.nan],
[10, 100, 1000, -10, -100],
[11, 111, 1111, -11, -111],
[12, 122, 122, -12, -122]
])
ds = tf.data.Dataset.from_tensor_slices(simple_data_samples)
ds = dataset.window(4, shift=1, drop_remainder=True)
ds = ds.flat_map(lambda x: x).batch(4)
ds = ds.shuffle(dim_dataset)
# clear nan row here
This must be done after shuffle.
###############EDIT UPDATE##############
next step is to split label with this short function:
def split_feature_label(x):
return x[:input_sequence_length], x[input_sequence_length:,
slice(slice_size, None, None)]_test
and final transform like this...
ds = ds.map(split_feature_label)
# split data train test set.
split = round(split_train_ratio * (dim_dataset - input_sequence_length - forecast_sequence_length))
ds_train = ds.take(split)
ds_valid = ds.skip(split)
ds_train = ds_train.batch(batch_size, drop_remainder=True)
ds_valid = ds.batch(batch_size, drop_remainder=True)
ds_train = ds_train.prefetch(1)
ds_valid = ds.prefetch(1)
return iter(ds_train), iter(ds_valid)
If I introduce this proposed solution:
ds = ds.map(lambda x: tf.boolean_mask(x, tf.reduce_all(~tf.math.is_nan(x), axis=-1)))
It seem to work until I call the next step of splitting my input and label (last column == label). The code run but if I try to inspect my data after this splitting I get these messages:
2022-12-23 10:15:05.514989: W tensorflow/core/framework/op_kernel.cc:1780] OP_REQUIRES failed at strided_slice_op.cc:111 : INVALID_ARGUMENT: slice index 3 of dimension 0 out of bounds.
and
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__IteratorGetNext_output_types_2_device_/job:localhost/replica:0/task:0/device:CPU:0}} slice index 3 of dimension 0 out of bounds.
[[{{node strided_slice_1}}]] [Op:IteratorGetNext]
````
Something seem to change in the shape or structure!?!?