4

I need guidance on code to write a CSV file that drops rows with specific numbers in the first column [0]. My script writes a file, but it contains the rows that I am working to delete. I suspect that I may have an issue with the spreadsheet being read as one long string rather than ~150 rows.

import csv

Property_ID_To_Delete = {4472738, 4905985, 4905998, 4678278, 4919702, 4472936, 2874431, 4949190, 4949189, 4472759, 4905977, 4905995, 4472934, 4905982, 4906002, 4472933, 4905985, 4472779, 4472767, 4472927, 4472782, 4472768, 4472750, 4472769, 4472752, 4472748, 4472751, 4905989, 4472929, 4472930, 4472753, 4933246, 4472754, 4472772, 4472739, 4472761, 4472778}

with open('2015v1.csv', 'rt') as infile:
    with open('2015v1_edit.csv', 'wt') as outfile:
        writer = csv.writer(outfile)
        for row in csv.reader(infile):
            if row[0] != Property_ID_To_Delete:
                writer.writerow(row)

Here is the data: https://docs.google.com/spreadsheets/d/19zEMRcir_Impfw3CuexDhj8PBcKPDP46URZ9OA3uV9w/edit?usp=sharing

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
JCM
  • 365
  • 1
  • 2
  • 15

1 Answers1

3

You need to check if an id, converted into an integer as you set as integers, is contained in the ids to delete. Write the line only if its not contained. You compare the id in the first column with the whole set of ids to be deleted. A string is always not equal to a set:

>>> '1' != {1}
True

Therefore, you get all rows in your output.

Change:

if row[0] != Property_ID_To_Delete:

into:

if int(row[0]) not in Property_ID_To_Delete:

EDIT

You need tow write the header of your infile first before trying to convert the first column entry into an integer:

with open('2015v1.csv', 'rt') as infile:
    with open('2015v1_edit.csv', 'wt') as outfile:
        writer = csv.writer(outfile)
        reader = csv.reader(infile)
        writer.writerow(next(reader))
        for row in reader:
            if int(row[0]) not in Property_ID_To_Delete:
                writer.writerow(row)
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
  • Traceback (most recent call last): File "", line 7, in if int(row[0]) not in Property_ID_To_Delete: ValueError: invalid literal for int() with base 10: 'Property Id' – JCM May 06 '16 at 18:37
  • After the error message, I have been trying various ways to change the Property ID column to integer unsuccessfully. – JCM May 06 '16 at 18:38
  • You need to treat the first line of your (the header) differently. I added some code to my answer that should work. – Mike Müller May 06 '16 at 19:47