I have hundreds of company report .txt files, and I want to extract some information from it. For example, one part of the file looks like this:
Mr. Davido will receive a base salary of $700,000 during the initial and any subsequent
term. The Chief Executive Officer of the Company (the CEO) and the Board (or a committee
thereof) shall review Mr. Davidos base salary at least annually, and may increase it at
any time in their sole discretion
I am trying to use pyparsing to extract the base salary value of the guy.
code
from pyparsing import *
# define grammar
digits = "0123456789"
integer = Word( digits )
money = Group("$"+integer+','+integer + Optional(','+integer , ' '))
start = Word("base salary")
salary = start + money
#search
for t in text:
result = salary.parseString( text )
print result
This always gives the error:
pyparsing.ParseException: Expected W:(base...) (at char 0), (line:1, col:1)
After some simple tests, I find that use this code I can only find what I want from the particular form of text which start with:
"base salary $700,000......"
and it can only identify the first case appears in that text.
So I was wondering if someone could help me with it. And, if possible also identify the name of the guy, and store the name and salary into a dataframe.
Thank you so much.