Data csv file into different text files with Python

Question

I'm a beginner in programming, but for a Dutch text categorization experiment I want to turn every instance (row) of a csv file into separate .txt files, so that the texts can be analyzed by a NLP tool. My csv looks like this.

As you can see, each instance has text in the column 'Taaloefening1' or in the column 'Taaloefening2'. Now I need to save the text per instance in a .txt file and the name of the file needs to be the id and the label. I was hoping I could to this automatically by programming a script in Python by using the csv module. I have an idea about how to save the text into a .txt file, but I have no idea how to take the id and label, which match the text, as the file name. Any ideas?

The [`csv`](https://docs.python.org/3/library/csv.html) module contains some useful tools. — Kendas, Jun 09 '17 at 08:29
@ÉbeIsaac I'm unsure, but to be sure, I'd export the file into a `csv` format. — Kendas, Jun 09 '17 at 08:36
@Kendas, I tried to export it to a csv file (by saving it as), but when I opened it, the columns were gone and everything was just in rows. I'm a beginner in Python and all that comes with it, so maybe I did something wrong — Bambi, Jun 09 '17 at 08:53
A `csv` file should have the first line as `id,Label,Taaloefening1,Taaloefening2` and the second as `P642,PR,,Terwijl......` (note the two commas). Excel should have the possibility to save files in this format, though I don't have a one to test it handy. — Kendas, Jun 09 '17 at 09:16
@Kendas, based on your comments, I changed my question. I managed to create a csv from the excel — Bambi, Jun 10 '17 at 14:36

score 1 · Accepted Answer · answered Jun 10 '17 at 19:32

The csv.DictReader should be able to do what you need:

from csv import DictReader

INPUT_FILE = 'data.csv'

with open(INPUT_FILE, 'rb') as csvfile:
    reader = DictReader(csvfile)
    for row in reader:
        file_name = "{}_{}.txt".format(row["id"], row["Label"])
        if row["Taaloefening1"]:     # if this field is not empty
            line = row["Taaloefening1"] + '\n'
        elif row["Taaloefening2"]:
            line = row["Taaloefening2"] + '\n'
        else:
            print("Both 'Taaloefening2' and 'Taaloefening2' empty on {}_{}. Skipping.".format(row["id"], row["Label"]))
            continue
        with open(file_name, 'w') as output:
            output.write(line)

Data csv file into different text files with Python

1 Answers1