For my research I am trying to count, from a corpus, the number of times (co-occurrence) a series of compound terms (e.g. Safety Hazard) stored in a file, 1 line per phrase, appear within a 16 word window of a target keyword (e.g. Facility). I am not a programmer, have been trying to break it into 2 elements: First extract a file from the corpus where I have a match on my target keyword, with the 8 words before and after. Then try and match my 'vocabulary file' to that extract. I am on part 1, have tried this, but I just get the <_sre.SRE_Match object at 0x028FFE78> message and am struggling trying to use repr: Any suggestions appreciated or other ways to do this. Ultimately I want an export file that has my vocabulary words with a count after them, indicating how often they have been found in that window with my target word. The use of re.search logic is based on what I have found on this message board which is why I tried it:
input=open("Corpus.txt", "r")
matches=[]
lines=input.readlines()
for line in lines:
m=re.search(r'(\S+\s+){0,8}facility(\s+\S+){0,8}',line)
if m:
matches.append(m)
for m in matches:
output.write(str(m))
output.close()
Any help appreciated, Paul