0

I have multiple text files and wanted to extract the string when a specific pattern matches and append it in a data frame with the file name and the string. In my case multiple same patterns are present in these text files.

sample.txt:
"government high school
Govt high school physics department
Employee Designation School Assistant"

What I am getting:
    file         |             Org                      |              Org2 
sample.txt           government high school                   Govt high school physics department
sample.txt           government high school                   Employee Designation School Assistant

What I am looking for:
    file         |             Org                      |              Org2 
sample.txt           government high school                   Govt high school physics department

Here is the code I am using :

prs_path = "C://Users//subhr//scope_txt//"

df3 = [] 
for file in os.listdir(prs_path):
    Name = None
    with open(prs_path + file) as fd:
        for line in fd:
            line = line.lower()
            match = re.search('r(^.*government.*$)', line, re.I)
            Org = ""
            if match:
                Org = match.group()
                df3.append([file, Org])
            Org2 = ""
            Org3 = ""
            Org = ""
            if match is None:
                match2 = re.search('r(^.*school.*$)|(^.*college.*$)', line,re.I)
                if match2:
                    Org2 = match2.group()
                    df3.append([file, Org, Org2])
                if match2 is None:
                    match3 = re.search('r(^.*power.*$)', line, re.I)
                    if match3:
                        Org3 = match3.group()
                        df3.append([file, Org, Org2, Org3])
                    if match3 is None:
                        continue

Where am I going wrong?

SUBHRA SANKHA
  • 118
  • 1
  • 2
  • 11

1 Answers1

0

Try to use this case r"^(.*?):$\n\"(.*?) (.*?)$\n(.*?) (.*? .*?) (.*?)$"

Your input will split in 6 groups, check this for testing.

https://regex101.com/r/UN9cjZ/1

Nikolay B.
  • 19
  • 6
  • Thanks for the reply but that was just an example. My actual text files have different lines with different lengths in it :( – SUBHRA SANKHA Aug 21 '20 at 18:29