21

I want to write a program which filters the lines from my text file which contain the word "apple" and write those lines into a new text file.

What I have tried just writes the word "apple" in my new text file, whereas I want whole lines.

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
ahmad
  • 219
  • 1
  • 2
  • 3

5 Answers5

36

Use can get all lines containing 'apple' using a list-comprehension:

[ line for line in open('textfile') if 'apple' in line]

So - also in one code-line - you can create the new textfile:

open('newfile','w').writelines([ line for line in open('textfile') if 'apple' in line])

And eyquem is right: it's definitely faster to keep it as an iterator and write

open('newfile','w').writelines(line for line in open('textfile') if 'apple' in line)
phynfo
  • 4,830
  • 1
  • 25
  • 38
  • 4
    A list comprehension creates an object. Using a generator expression would be better. By the way it can be written ``writelines( line for line in open('textfile') if 'apple' in line) `` – eyquem Mar 09 '11 at 12:03
  • 1
    @eyquem: Ok, I totally agree that -- for large files -- it should be the better to use generators, since a generator behaves lazily and thus doesnt consume that much memory. But probably for small files the list-comprehension is the faster solution? – phynfo Mar 09 '11 at 12:23
  • @Phynfo: Nope... keeping things as generators/iterators is far more efficient. The list comprehension is still creating the iterator, which is then filling a list, and once complete passing that list to writelines which turns it back into an iterator. – Chris Cogdon Nov 04 '15 at 00:46
  • Can I use multiple strings here to match? Ex: I want to retain only line with string 'apple' or 'orange' – Gajendra D Ambi Apr 26 '17 at 08:13
  • You can replace `if 'apple' in line` with `if 'apple' in line or 'orange' in line` – phynfo May 03 '17 at 20:24
12
from itertools import ifilter

with open('source.txt','rb') as f, open('new.txt','wb') as g:
    g.writelines(ifilter(lambda line: 'apple' in line, f))
Hans Ginzel
  • 8,192
  • 3
  • 24
  • 22
eyquem
  • 26,771
  • 7
  • 38
  • 46
  • 4
    Be aware that `itertools.ifilter` has been removed in Python 3.5+, and is replaced with the built-in `filter`https://docs.python.org/3/library/functions.html#filter, which is equivalent to the generator expression in the answer of phynfo . – LudvigH Jul 07 '20 at 09:45
11

Using generators, this is memory efficient and fast

def apple_finder(file):
    for line in file:
        if 'apple' in line:
             yield line


source = open('forest','rb')

apples = apple_finder(source)

I love easy solutions with no brain damage for reading :-)

Mario César
  • 3,699
  • 2
  • 27
  • 42
  • 1
    The function **apple_finder(file)** is a function generator and **apples** is a generator. The latter do the same job as **ifilter(lambda line: 'apple' in line, f)** in two lines (import comprised) – eyquem Mar 09 '11 at 12:39
6

For Python3 - here is working and fast example

    text = b'line contains text'
    with open('input.txt', 'rb') as file_in:
        with open('output.txt', 'wb') as file_out:
            file_out.writelines(
                filter(lambda line: text in line, file_in)
            )

Tests:

input.txt:

Test line contains text
Not line not contains this text

HEY
Another line contains text

output.txt:

Test line contains text
Another line contains text

More about code:

b'line contains text' - the b states for binary and we operating on this kind of string skipping some problems with encoding etc.
Official docs: https://docs.python.org/3/library/stdtypes.html?highlight=binary#bytes-objects

rb wb - operating on read and write operation with binary like objects
Official docs: https://docs.python.org/3/library/io.html#binary-i-o

filter() - takes expression and iterable object. Returns filtered object. In our example filter takes all lines (iterable object) and apply for each line lambda what inform filter if given line should be returned or not.

lambda - contains two elements argument: expression. In our example lambda check if line contains given text. Return True or False after expression check.

Example with lambda and filter: https://blog.finxter.com/how-to-filter-in-python-using-lambda-functions/

pbaranski
  • 22,778
  • 19
  • 100
  • 117
2

if "apple" in line: should work.

neil
  • 3,387
  • 1
  • 14
  • 11