1

Let's say I have text data in a pandas data frame with multi-label.

   Text              Label
0  I love you        A, B
1  Thank you         C, D
2  You are welcome   A, B, C

I wanted to convert it to a text file, where each row is the sentence and separated by label __label__ sign, and the each label is separated just by a space

Therefore, the text file will look like this:

I love you __label__  A B
Thank you __label__ C D
You are welcome __label__ A B C
Jaya A
  • 145
  • 1
  • 8
  • Can you share the code you’ve written so far in an attempt to meet the requirements you’ve described as a [mre], along with a succinct explanation of where that attempt falls short of said requirements? We’re not going to write your code *for* you, but we can assist you if you can demonstrate a good-faith effort on your part before posting in accordance with [ask]. – esqew Jun 13 '22 at 08:21

2 Answers2

1
import pandas as pd

df = {
'Text': ['I love you', 'Thank you', 'You are welcome'],
'Label': ['A B', 'C D', 'A B C']
}

data = pd.DataFrame(df, columns=['Text', 'Label'])
print(data)

with open('read1me.txt', 'w') as f:
    for index, row in data.iterrows():
        text = row['Text']
        lbl = row['Label'].replace(',', '')
        f.write(f'{text}\t{"__label__"}\t{lbl}' + "\n")
Olasimbo
  • 965
  • 6
  • 14
  • Thanks. But, the original label has commas.: A, B, C. I wanted to replace comma with just a white space – Jaya A Jun 13 '22 at 09:11
  • @JayaA ok ive edited my solution. we just need to use the `replace` function i.e `row['Label'].replace(',', '')` – Olasimbo Jun 13 '22 at 09:52
1

You can do this with to_csv() and set the separator as ' __label__ ':

df.to_csv('filename.txt', sep=' __label__ ', header=False, index=False)
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26