1

I have a dataset which called preprocessed_sample in the following format

preprocessed_sample.ftr.zstd 

and I am opening it using the following code

df = pd.read_feather(filepath)

The output looks something like that

index   text
0   0   i really dont come across how i actually am an...
1   1   music has become the only way i am staying san...
2   2   adults are contradicting
3   3   exo are breathing 553 miles away from me. they...
4   4   im missing people that i met when i was hospit...

and finally I would like to save this dataset in a file which called 'examples' and contains all these texts into txt format.

Update: @Tsingis I would like to have the above lines into txt files, for example the first line 'i really dont come across how i actually am an...' will be a file named 'line1.txt', in the same way all the lines will be txt files into a folder which called 'examples'.

  • Somewhat unclear what is the goal. Can you provide example output? – Tsingis Feb 01 '23 at 19:56
  • Then you pretty much need to loop through rows of the `text` column to write each line to separate file using file handler. https://www.pythontutorial.net/python-basics/python-write-text-file/ – Tsingis Feb 01 '23 at 20:02

2 Answers2

3

You can use the following code:

import pathlib

data_dir = pathlib.Path('./examples')
data_dir.mkdir(exist_ok=True)

for i, text in enumerate(df['text'], 1):
    with open(f'examples/line{i}.txt', 'w') as fp:
        fp.write(text)

Output:

examples/
├── line1.txt
├── line2.txt
├── line3.txt
├── line4.txt
└── line5.txt

1 directory, 5 files

line1.txt:

i really dont come across how i actually am an...
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

Another way, is to use pandas built-ins itertuples and to_csv :

import pandas as pd

for row in df.itertuples():
    pd.Series(row.text).to_csv(f"examples/line{row.index+1}.txt",
                               index=False, header=False)
Corralien
  • 109,409
  • 8
  • 28
  • 52
Timeless
  • 22,580
  • 4
  • 12
  • 30