How do I create test dataset for tf ranking model?

Question

I'm trying to evaluate the NeuralGAMS LTR model using a test set and I'm doing this to create it:

# Build Test Dataset for Model Ingestion
features = preprocess_features(features_file)
feature_cols = np.array(features['cols'])

# Create specs for pipeline
context_spec_ = {}
example_spec_ = {feat: tf.io.FixedLenFeature(shape=(1,), dtype=tf.float32, default_value=0.0) for feat in feature_cols}
label_spec_ = ('relevance_label', tf.io.FixedLenFeature(shape=(1,), dtype=tf.int64, default_value=-1))

dataset_hparams = tfr.keras.pipeline.DatasetHparams(
    train_input_pattern='tfrecord_HJ/HJ/all_data/train.tfrecords',
    valid_input_pattern='tfrecord_GD/Golden_Data/LibSVM_format/test.tfrecords',
    train_batch_size=128,
    valid_batch_size=1,
    dataset_reader=tfr.keras.pipeline.DatasetHparams.dataset_reader)

# Define Dataset Builder
dataset_builder = tfr.keras.pipeline.SimpleDatasetBuilder(
    {},
    example_spec_,
    mask_feature_name="example_list_mask",
    label_spec=label_spec_,
    hparams=dataset_hparams,
    sample_weight_spec=None)

ds = dataset_builder.build_valid_dataset()

When I use this ds for predictions this way: predictions = model.predict(ds)

I get "When providing an infinite dataset, you must specify the number of steps to run (if you did not intend to create an infinite dataset, make sure to not call repeat() on the dataset)." error.

My questions are:

Why is an infinite dataset created when I'm just reading from an existing tfrecord to create it.
When I add 'steps', how do I get the predictions for all the data
How do I map these back to the inputs?
Better way to get predictions?

How do I create test dataset for tf ranking model?

0 Answers0