2

I like to save portions of the original text file, which can be identified in between 'startswith' and 'endswith' strings, into a new text file.

Example: The input text file contains following lines:

...abc…
...starts with string...
...def...
...ends with string...
...ghi...

...jkl...
...starts with string...
...mno...
...ends with string...
...pqr...

I am interested to extract the following lines into output text file:

starts with string...def...ends with string
starts with string...mno...ends with string

My following code returns empty list [ ]. Please help correct my code.

with open('file_in.txt','r') as fi:
    id = []
    for ln in fi:
        if ln.startswith("start with string"):
            if ln.endswith("ends with string"):
                id.append(ln[:])
                with open(file_out.txt, 'a', encoding='utf-8') as fo:
                    fo.write (",".join(id))
print(id)

I expect the file.out.txt to contain, all strings which start with the "start with string" and end with the "ends with string".

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
anatta
  • 163
  • 1
  • 3
  • 9
  • Thanks for the update with test data. I've updated [my answer](https://stackoverflow.com/a/55544826/3767239) accordingly, please check whether it fits your needs. – a_guest Apr 07 '19 at 00:16

3 Answers3

1

startswith and endswith return True or False rather than a position you can use to slice your string. Try find or index instead. For example:

start = 'starts with string'
end = 'ends with string'
s = '...abc… ...starts with string... ...def... ...ends with string... ...ghi...'

sub = s[s.find(start):s.find(end) + len(end)]
print(sub)
# starts with string... ...def... ...ends with string

You will need to add a bit of checking in your loop to see if the start and end strings exist because find will return -1 if there is no match and this would result in some unintended slicing.

benvc
  • 14,448
  • 4
  • 33
  • 54
1

You can use a separate variable to indicate whether the current line is part of an interesting section and toggle this variable based on start and stop markers. Then you can also turn this function into a generator:

def extract(fh, start, stop):
    sub = False
    for line in fh:
        sub |= start in line
        if sub:
            yield line
            sub ^= stop in line

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

In Python 3.8 you can use assignment expressions:

import itertools as it

def extract(fh, start, stop):
    while any(start in (line := x) for x in fh):
        yield line
        yield from it.takewhile(lambda x: stop not in x, ((line := y) for y in fh))
        yield line

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

Variation: Excluding start and stop markers

In case start and stop markers are to be excluded from the output, we can again use itertools.takewhile:

import itertools as it

def extract(fh, start, stop):
    while any(start in x for x in fh):
        yield from it.takewhile(lambda x: stop not in x, fh)

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))
a_guest
  • 34,165
  • 12
  • 64
  • 118
  • @MadPhysicist I've updated my answer to meet the OP's requirements (include start and stop markers in the output), also with an example usage. – a_guest Apr 07 '19 at 00:22
  • @a_guest: Using 'assignment expressions': following error: >File "", line 4 while any(start in (line := x) for x in fh): ^ SyntaxError: invalid syntax – anatta Apr 07 '19 at 21:46
  • @anatta As mentioned, assignment expressions were introduced in Python 3.8 which is currently only available as [alpha version](https://www.python.org/downloads/release/python-380a3/). – a_guest Apr 07 '19 at 21:48
  • @a_guest: Missed it. I will update to 3.8 alpha, and try. Thanks. – anatta Apr 07 '19 at 21:51
1

At the end of each line there is a character to tell the computer to show a new row. I am assuming here that "start with string" and "ends with string" are on the same line. If this is not the case add --"id.append(ln[:])"-- directly below the first if statement.

Try

ln.endswith("ends with string"+'\n' )

or

ln.endswith("ends with string"+'\n' +'\r')
with open('C:\\Py\\testing.txt','r') as fi:
    id = []
    x = 0
    copy_line = False
    for ln in fi:
        if "starts with string" in ln:
            copy_line = True
        if copy_line:
            id.append ( ln[:] )
        if "ends with string" in ln :
            copy_line = False

    with open ('C:\\Py\\testing_out.txt', 'a', encoding='utf-8' ) as fo:
        fo.write (",".join(id))

print(id)
Jortega
  • 3,616
  • 1
  • 18
  • 21