1

I'm a beginner in TensorFlow. I want to train a 1-D conv model. I have one-rowed csv files for each row of my original data.

csv files look like this

csv_file1: 1.1, 1.3, 1.5, 1.5, 1
csv_file2: 2.1, 2.3, 2.7, 2.9, 0

The last column(containing 1 & 0) are the labels for the single-row csv files

Following the link I wrote the following pieces of code.

I converted the csv files to TFRecord using the following code

    with tf.python_io.TFRecordWriter(filename) as writer:
        features, label = df_values[:, 1:-1], df_values[:, -1:]
        example = tf.train.Example()
        example.features.feature["features"].float_list.value.extend(features[0])
        example.features.feature["label"].int64_list.value.append(label[0])
        writer.write(example.SerializeToString())

I want to now read the files and this is the code I'm using.

def _parse_function(data_record):
    features = {
        'label': tf.FixedLenSequenceFeature([], tf.int64, allow_missing = True),
        'features': tf.FixedLenSequenceFeature([], tf.float32, allow_missing = True),
    }
    sample = tf.parse_single_example(data_record, features)
    return sample['features'], sample['label']

filenames = glob.glob("*.tfrecords")
dataset = tf.data.TFRecordDataset(filenames)
dataset = dataset.map(_parse_function)  
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(batch_len)
# Create a one-shot iterator
iterator = dataset.make_one_shot_iterator()

X,y = iterator.get_next()

From here the problem starts, From the documentation I understand what session does but failing to put it into code. Assuming I'll later figure out how to use

tf.seesion.run()

I wrote the below code but don't know how to actually include it into my main script and further use it to train my model.

x_train_batch, y_train_batch = tf.train.shuffle_batch(
tensors=[X_train, y_train],
batch_size=batch_size,
capacity=capacity,
min_after_dequeue=min_after_dequeue,
enqueue_many=True,
num_threads=8)

x_train_batch = tf.cast(x_train_batch, tf.float32)
x_train_batch = tf.reshape(x_train_batch, shape=(batch_size, 1,65281))

y_train_batch = tf.cast(y_train_batch, tf.int64)
y_train_batch = tf.one_hot(y_train_batch, num_classes)

Any help regarding how to proceed further will help.

PS: Assuming my data was loaded into an np.array the dimension would be, (6571, 65281). Since it's astronomical data, each star has 65781 points.

  • Is there any reason why you want to use tf.train.shuffle_batch here? Because your dataset api is transforming your input features and labels to the formats you want to feed for training, batching them in to your desired batchsize etc. You do not need to redo all of these steps again. How are you defining your model? Are you using a Keras interface or Tensorflow? – kvish Oct 14 '18 at 13:49

0 Answers0