0

I'm a beginner in programming, but for a Dutch text categorization experiment I want to turn every instance (row) of a csv file into separate .txt files, so that the texts can be analyzed by a NLP tool. My csv looks like this.

enter image description here

As you can see, each instance has text in the column 'Taaloefening1' or in the column 'Taaloefening2'. Now I need to save the text per instance in a .txt file and the name of the file needs to be the id and the label. I was hoping I could to this automatically by programming a script in Python by using the csv module. I have an idea about how to save the text into a .txt file, but I have no idea how to take the id and label, which match the text, as the file name. Any ideas?

Bambi
  • 715
  • 2
  • 8
  • 19
  • 1
    The [`csv`](https://docs.python.org/3/library/csv.html) module contains some useful tools. – Kendas Jun 09 '17 at 08:29
  • @Kendas, does `csv` module work for `xls` format too? – Ébe Isaac Jun 09 '17 at 08:33
  • @ÉbeIsaac I'm unsure, but to be sure, I'd export the file into a `csv` format. – Kendas Jun 09 '17 at 08:36
  • @Kendas, I tried to export it to a csv file (by saving it as), but when I opened it, the columns were gone and everything was just in rows. I'm a beginner in Python and all that comes with it, so maybe I did something wrong – Bambi Jun 09 '17 at 08:53
  • A `csv` file should have the first line as `id,Label,Taaloefening1,Taaloefening2` and the second as `P642,PR,,Terwijl......` (note the two commas). Excel should have the possibility to save files in this format, though I don't have a one to test it handy. – Kendas Jun 09 '17 at 09:16
  • @Kendas, based on your comments, I changed my question. I managed to create a csv from the excel – Bambi Jun 10 '17 at 14:36

1 Answers1

1

The csv.DictReader should be able to do what you need:

from csv import DictReader

INPUT_FILE = 'data.csv'

with open(INPUT_FILE, 'rb') as csvfile:
    reader = DictReader(csvfile)
    for row in reader:
        file_name = "{}_{}.txt".format(row["id"], row["Label"])
        if row["Taaloefening1"]:     # if this field is not empty
            line = row["Taaloefening1"] + '\n'
        elif row["Taaloefening2"]:
            line = row["Taaloefening2"] + '\n'
        else:
            print("Both 'Taaloefening2' and 'Taaloefening2' empty on {}_{}. Skipping.".format(row["id"], row["Label"]))
            continue
        with open(file_name, 'w') as output:
            output.write(line)
Kendas
  • 1,963
  • 13
  • 20