2

I find a lot of documents/forums telling how to convert a csv to a Tensorflow dataset, but not a single one saying how to convert a dataset to a csv. I have csv with two columns now (filename, weight - more columns maybe be added later). I read that into tensorflow and create a dataset. At the end of the script the 2nd column is modified and I need to save these columns to a csv. I need them in csv (not checkpoint) because I may need to do stuff with it on Matlab.

I tried to call the dataset map function and tried to save to csv inside map function. But it doesn't work as expected.

#reading csv to dataset

def map_func1(line):
    FIELD_DEFAULTS = [[""], [0.0]]  
    sample,weight = tf.decode_csv(line, FIELD_DEFAULTS)
    return sample,weight

ds = tf.data.TextLineDataset('sample_weights.csv')
ds_1 = ds.map(map_func1)

# then the dataset is modified to ds_2 then, not including code- it's just another map func

# trying to save to csv - 


def map_func3(writer,x):
    x0,x1 = x
    writer.writerow([x0,x1])
    return x 

with open('sample_weights_mod.csv','w') as file:
    writer = csv.writer(file)
    ds_3 = ds_2.map(lambda *x: map_func3(writer,x))

This doesn't work as expected just writes the tensor shape to csv Tensor("arg0:0", shape=(), dtype=string) Tensor("arg1:0", shape=(), dtype=float32)

This solution is probably a bad one. I really need to get a neat way to do this

M.John
  • 51
  • 1
  • 6
  • dataset is a part of input pipeline and generally part of the graph. If you want to modify csv why do you need Tensorflow? – Sharky Apr 04 '19 at 07:44
  • 1
    The csv contains weights of samples for AdaBoost . Since our dataset (not the csv) is large & used for CNN training in batches, I need to save the weights recalculated per batch back to csv and at the end do normalization for the whole dataset. Earlier I tried to read the csv in the normal Python way into a dict, but updating the dict from recalculated tensors proved to be difficult (https://stackoverflow.com/questions/55351537/how-to-create-a-dict-from-two-tensors-in-tensorflow). Hence I went with the recommended way of treating the csv data also as a dataset but ran into this problem. – M.John Apr 04 '19 at 23:42
  • Does [this](https://www.tensorflow.org/tutorials/load_data/csv#using_tfdata) solve your issue. Thanks! –  Apr 16 '21 at 05:58

1 Answers1

0

Though not a good way of doing for now I did it as below

type(movies) ## movies variable is of type tensorflow.python.data.ops.dataset_ops.MapDataset
z=[]
for example in movies:
  z.append(example.numpy().decode("utf-8"))

mv={'movie_title':z}  
pd.DataFrame(mv).to_csv('movie.csv')

sakeesh
  • 919
  • 1
  • 10
  • 24