I have multiple text files and wanted to extract the string when a specific pattern matches and append it in a data frame with the file name and the string. In my case multiple same patterns are present in these text files.
sample.txt:
"government high school
Govt high school physics department
Employee Designation School Assistant"
What I am getting:
file | Org | Org2
sample.txt government high school Govt high school physics department
sample.txt government high school Employee Designation School Assistant
What I am looking for:
file | Org | Org2
sample.txt government high school Govt high school physics department
Here is the code I am using :
prs_path = "C://Users//subhr//scope_txt//"
df3 = []
for file in os.listdir(prs_path):
Name = None
with open(prs_path + file) as fd:
for line in fd:
line = line.lower()
match = re.search('r(^.*government.*$)', line, re.I)
Org = ""
if match:
Org = match.group()
df3.append([file, Org])
Org2 = ""
Org3 = ""
Org = ""
if match is None:
match2 = re.search('r(^.*school.*$)|(^.*college.*$)', line,re.I)
if match2:
Org2 = match2.group()
df3.append([file, Org, Org2])
if match2 is None:
match3 = re.search('r(^.*power.*$)', line, re.I)
if match3:
Org3 = match3.group()
df3.append([file, Org, Org2, Org3])
if match3 is None:
continue
Where am I going wrong?