how to remove all rows in the CSV file which a value is greater than another?

Question

I have a csv file, 300 lines:

ID,HEIGHT,MEAN WEIGHT,20-Nov-2002,05-Mar-2003,09-Apr-2003,23-Jul-2003 1,1.80,80,78,78,82,82 2,1.60,58,56,60,60,56 3,1.90,100,98,102,98,102

I want a file to delete all lines where the column MEAN WEIGHT> 75 and obtain another new file

ID,HEIGHT,MEAN WEIGHT,20-Nov-2002,05-Mar-2003,09-Apr-2003,23-Jul-2003 1,1.80,80,78,78,82,82 3,1.90,100,98,102,98,102

Panda is suited for doing this kind of jobs but it also can be done using normal csv module — The6thSense, Sep 08 '15 at 11:29
You can check solution from: http://stackoverflow.com/questions/13651117/pandas-filter-lines-on-load-in-read-csv — Maciej Lach, Sep 08 '15 at 11:36

score 2 · Answer 1 · answered Sep 08 '15 at 12:56

2

if you're open to non Python solutions and access to bash shell or awk

$ awk -F, '$3>75' filename 

ID,HEIGHT,MEAN WEIGHT,20-Nov-2002,05-Mar-2003,09-Apr-2003,23-Jul-2003
1,1.80,80,78,78,82,82
3,1.90,100,98,102,98,102

answered Sep 08 '15 at 12:56

karakfa

66,216
7
41
56

kikocorreoso · Answer 2 · 2015-09-08T13:11:38.740

1

Using plain python:

orig = open('original.csv', 'r')
modi = open('modified.csv', 'w')

#header
modi.write(orig.readline())

# data lines
for line in old:
    if float(line.split(',')[2]) <= 75:
        modi.write(line)

orig.close()
modi.close()

edited Sep 08 '15 at 13:11

answered Sep 08 '15 at 11:38

kikocorreoso

3,999
1
17
26

i tried but only appear the header.in the new file. – Pedro Sousa Sep 08 '15 at 12:36

YOBA · Answer 3 · 2015-09-08T12:41:31.583

1

as @Vignesh Kalai suggested, use pandas

import pandas as pd

df = pd.read_csv("yourfile.csv", sep=",")

df[ df["MEAN WEIGHT"] > 75 ].to_csv("yournewfile.csv", index=False)

And it's done.

P.S. You're asking for values less than 75 but you're displaying the opposit .If it is the first case replace "> 75" by "<= 75".

edited Sep 08 '15 at 12:41

answered Sep 08 '15 at 11:39

YOBA

2,759
1
14
29

It works but add a new column ,ID,HEIGHT,MEAN WEIGHT,20-Nov-2002,05-Mar-2003,09-Apr-2003,23-Jul-2003 0,1,1.8,80,78,78,82,82 2,3,1.9,100,98,102,98,102 – Pedro Sousa Sep 08 '15 at 12:31
@PedroSousa Sure, use index = False (See Edit) – YOBA Sep 08 '15 at 12:41

score 0 · Answer 4 · answered Sep 08 '15 at 12:42

You can use the Python csv library as follows:

import csv

with open('input.csv', 'r') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    # Write the header
    csv_output.writerow(next(csv_input))

    for cols in csv_input:
        if int(cols[2]) <= 75:    # Keep weights <= 75
            csv_output.writerow(cols)

So with the data you have given, you will get the following output.csv file:

ID,HEIGHT,MEAN WEIGHT,20-Nov-2002,05-Mar-2003,09-Apr-2003,23-Jul-2003
2,1.60,58,56,60,60,56

Chris Koknat · Answer 5 · 2015-10-07T16:51:21.073

Perl solution which prints to screen, similar to karakfa's Awk solution:

perl -F, -ane 'print if $. == 1 or $F[4] > 75' filename

The @F autosplit array starts at index $F[0] while awk fields start with $1

This variation edits the file in-place:

perl -i -F, -ane 'print if $. == 1 or $F[4] > 75' filename

This variation edits the file in-place, and makes a backup filename.bak

perl -i.bak -F, -ane 'print if $. == 1 or $F[4] > 75' filename

how to remove all rows in the CSV file which a value is greater than another?

5 Answers5