0

I am trying to delete particular columns in csv file.

CSV file:

Name,Age,YearofService,Department,Allocation
Birla,49,12,Welding,Production
Robin,38,10,Molding,Production

I am trying to delete Entire column having column header "Department" and "Allocation".

My code :

 with open(input.csv,'r') as i:
    with open(output.csv,'w',new line='') as o:
       reader=csv.reader(i)
       writer = csv.writer(o)
       for row in reader:
          for i in range(len(row)):
            if row[i]!="Department" and row[i]!="Allocation":
              writer.writerow(row)

My output:

Name
Birla
Robin
Age
49
38
YearofService
12
10

Expected output:

Name,Age,YearofService
Birla,49,12
Robin,38,10

We cannot gaurantee Department and Allocation will be in column header position "3" and "4". thats y am using iteration through length of row

Balaji R B
  • 3
  • 1
  • 3
  • Read first row (the one with the headers). Find index numbers for the columns you want to delete. Delete them in the first row (larger index first) and write row to output CSV. Then read the remaining rows one by one and for each delete the unwanted columns and write the row. – Michael Butscher Aug 10 '22 at 12:24

3 Answers3

3

In this case, the csv.DictReader and csv.DictWriter classes are very handy:

import csv

with open("input.csv") as instream, open("output.csv", "w") as outstream:
    # Setup the input
    reader = csv.DictReader(instream)

    # Setup the output fields
    output_fields = reader.fieldnames
    output_fields.remove("Department")
    output_fields.remove("Allocation")

    # Setup the output
    writer = csv.DictWriter(
        outstream,
        fieldnames=output_fields,
        extrasaction="ignore",  # Ignore extra dictionary keys/values
    )

    # Write to the output
    writer.writeheader()
    writer.writerows(reader)

Notes

  • For input, each of the row will be a dictionary such as

    {'Name': 'Birla', 'Age': '49', 'YearofService': '12', 'Department': 'Welding', 'Allocation': 'Production'}
    
  • For output, we remove those columns (fields) that we don't need, see output_fields

  • The extraaction parameter tells DictReader to ignore extra keys/values from the dictionaries

Update

In order to remove columns from a CSV file we need to

  1. Open the input file, reader all the rows, close it
  2. Open it again to write.

Here is the code, which I modified from the above

import csv

with open("input.csv") as instream:
    # Setup the input
    reader = csv.DictReader(instream)
    rows = list(reader)

    # Setup the output fields
    output_fields = reader.fieldnames
    output_fields.remove("Department")
    output_fields.remove("Allocation")

with open("input.csv", "w") as outstream:
    # Setup the output
    writer = csv.DictWriter(
        outstream,
        fieldnames=output_fields,
        extrasaction="ignore",  # Ignore extra dictionary keys/values
    )

    # Write to the output
    writer.writeheader()
    writer.writerows(rows)
Hai Vu
  • 37,849
  • 11
  • 66
  • 93
1

The quickest and most easy way is to just open in it in excel and delete the column that you want, i know it isn't what you ask for but it is the first thing it comes to my mind for a workaround.

Aherontas
  • 45
  • 6
1

You can write something like this (but still it's better to use pandas):

import csv

def delete_cols(file: str, cols_to_delete: list):
    cols_to_delete = set(cols_to_delete)
    with open(file) as file, open('output.csv', 'w') as output:
        reader = list(csv.reader(file))
        headers = reader[0]

        indexes_to_delete = [idx for idx, elem in enumerate(headers) if elem in cols_to_delete]
        result = [[o for idx, o in enumerate(obj) if idx not in indexes_to_delete] for obj in reader]

        writer = csv.writer(output)
        writer.writerows(result)


delete_cols('data.csv', ['Department', 'Allocation'])

file output.csv:

Name,Age,YearofService
Birla,49,12
Robin,38,10
funnydman
  • 9,083
  • 4
  • 40
  • 55