In python, how can you delete lines in a tabular text format that do NOT contain a specific word?

Question

I'm wondering what would be the best way to delete lines from a tabular text (while keeping the header) so that only specific entries that contain a word are in the tabular format.

Say for example, I have a tabular text file with animals and their names and ages. (The headers are Animals/Names/Ages.) How could I delete all lines that do not have 'Dog' in the 'Animal' heading?

Animals Names Ages

Dog Pippin 10

Dog Merry 14

Dog Frodo 12

Cat Sauron 11

Bird Gandalf 10

Bird Mordor 12

and I only want: Animals Names Ages

Dog Pippin 10

Dog Merry 14

Dog Frodo 12

I have my example code below with notes:

import os
headers = 1
field1 = 'ANIMALS'
sep = ' '

def getIndex(delimString, delimiter, name):
    '''Get position of item in a delimited string'''
    delimString = delimString.strip()
    lineList = delimString.split(delimiter)
    index = lineList.index(name)
    return index

infile = 'C:/example'
outfile = 'C:/folder/animals'

try:
    with open(infile, 'r') as fin:
        with open(outfile, 'w') as fout:
            for i in range(headers):
                line = fin.readline()
                fout.write(line)
            line = fin.readline()
            fout.write(line)

            # This is where I get confused, I try using the method below:
            for line in fin:
                lineList = line.split(sep)
                # But the code doesn't work as it only prints the header
                # I have a feeling it's the way I'm phrasing this area
                if field1 == 'DOG':
                    fout.write(line)
            print '{0} created.'.format(outfile)

except IOError:
    print "{0} doesn't exist- send help".format(infile)

What is the best way to selectively print items on a tabular .txt file?

It would be helpful to include the data in code instead of hard-coding paths that we don't have. Otherwise good job on including both data and code. With tabular format it looks like you need space separated values, and new line separated records. — Allan Wind, Nov 05 '21 at 01:43
It can be a txt or CSV file I guess, either works. And I'm only writing an example with the code above to the tabular text above because I've been creating my own prompts. — Ellie, Nov 05 '21 at 01:48
In your code, you split each line into `lineList`, but then proceed to check some variable `field1`, which you defined to be `'ANIMALS'` - since `'ANIMALS' == 'DOG'` is never `True`, no other lines are written. Instead `if lineList[0] == 'DOG':` would be what you're after. — Grismar, Nov 05 '21 at 01:53

score 0 · Answer 1 · answered Nov 05 '21 at 01:56

Using stdin and stdout instead of files to simplify it (you can replace that with open if you want):

import sys

headers = 1
sep = ' '
fin = sys.stdin
fout = sys.stdout
for i in range(headers):
    line = fin.readline()
    fout.write(line)
for line in fin:
    lineList = line.split(sep)
    if lineList[0] == 'Dog':
        fout.write(line)

and when you run this with:

python filter.py < input.txt
Animals Names Ages
Dog Pippin 10
Dog Merry 14
Dog Frodo 12

In other words, just don't print the stuff you don't want.

score 0 · Answer 2 · answered Nov 05 '21 at 02:00

0

Let's supose that it's a csv file, with this code you can return only the lines that has Dog as Animals value

import pandas as pd

df = pd.read_csv(file_name)

df.loc[df.Animals == 'Dog']

If you want to updante the file you can run df.to_csv(filename) and it will replace the csv file that has the same filenam, otherwise it will create another csv file with the filename.

I hope that did help you.

answered Nov 05 '21 at 02:00

Jhosef Matheus

137
7

Tiny task, huge dependency! – Klaus D. Nov 05 '21 at 02:14

In python, how can you delete lines in a tabular text format that do NOT contain a specific word?

2 Answers2