Read a file in python from one particular word to another and put it in a list

Question

So let's say i'm reading a txt file in Python which is something like this:

.. Keywords- key1; key2, key3; key4 Abstract .. ..

Now i want to parse the file until i find the word "Keywords", and then put all the keywords into a list, so the list should look something like this: ["key1", "key2", "key3", "key4"]

So its basically everything before the word Abstract and the keywords can be separated either with a comma (,) or with a semicolon (;) or a combination of both.

How do I go about this question?

score 1 · Answer 1 · answered Dec 30 '20 at 13:24

Here's one way using regex

import re

input_str = "this is a test Keywords- key1; key2, key3; key4 Abstract other stuff here"
p = re.compile(r'Keywords- (.+?)Abstract')
output = [v.strip() for v in re.split(';|,', p.findall(input_str)[0])] if p.findall(input_str) else list()

This will return either an empty list if there are no matches or a list of matches with white-space trimmed. So in this example the returning list will be:

['key1', 'key2', 'key3', 'key4']

I use re.split as it supports splitting on multiple separators so if you had additional separators you could just add them in further pipe separated options.

score 0 · Answer 2 · answered Dec 30 '20 at 13:33

Here is another regex version. Same as Steve's without the list comprehension.


import re

s = '''Keywords- key1; key2, key3; key4 Abstract stuff
 some of other text Keywords- key1; key2, key3; key4 Abstract
Keywords- key1; key2, key3; key4 Abstract
Keywords- key1; key2, key3; key4 Abstract'''

extract = r'Keywords-\s(.*)\sAbstract'
keywordList = re.findall(extract,s)

reg = r'\w+'

keywords = []
for i in range(len(keywordList)):
    keywords += re.findall(reg, keywordList[i])

print(keywords)


# ['key1', 'key2', 'key3', 'key4', 'key1', 'key2', 'key3', 'key4', 'key1', 'key2', 'key3', 'key4', 'key1', 'key2', 'key3', 'key4']

Read a file in python from one particular word to another and put it in a list

2 Answers2