0
B>DAY</B>, Arbitrator: Under the jurisdiction of the United States
Federal Government and the Federal Aviation Administration, the above
grievance arbitration was submitted to **Joseph L. Daly, Arbitrator**,
on August 15, 2017, at the Federal Aviation Administration South West
Regional Office Central Service Center, Fort Worth, Texas. Prior to
the arbitration hearing, the parties motions were made by the FAA
and NATCA to exclude witnesses from testifying at the arbitration
hearing. The arbitrator denied the motions by a written **decision dated
August 6, 2017**.</P>
<P>The parties filed post-hearing briefs on October 20, 2017. The
Opinion and Award was rendered on October 30, 2017.</P>

Above is the data from which i want to extract the decision date value and the corresponding arbitrator name like here is Joseph L. Daly

My current code is :-

with open ("file.sgm","r")as f:
contents =f.read()
soup = BeautifulSoup(contents, 'html.parser')
s = soup.find_all('p')
for i in s:
   data = i.text
   print(data)

I am able to extract the para data , but now how should i extract the corresponding values from the above data.

Junaid
  • 4,682
  • 1
  • 34
  • 40
user190549
  • 397
  • 1
  • 7
  • 15

1 Answers1

0
import re


data = """
B>DAY</B>, Arbitrator: Under the jurisdiction of the United States
Federal Government and the Federal Aviation Administration, the above
grievance arbitration was submitted to **Joseph L. Daly, Arbitrator**,
on August 15, 2017, at the Federal Aviation Administration South West
Regional Office Central Service Center, Fort Worth, Texas. Prior to
the arbitration hearing, the parties motions were made by the FAA
and NATCA to exclude witnesses from testifying at the arbitration
hearing. The arbitrator denied the motions by a written **decision dated
August 6, 2017**.</P>
<P>The parties filed post-hearing briefs on October 20, 2017. The
Opinion and Award was rendered on October 30, 2017.</P>
"""

match = re.findall(r"\*\*([^*]*)\*\*", data)

print(match)

Output:

['Joseph L. Daly, Arbitrator', 'decision dated\nAugust 6, 2017']