I am trying to pull list of individuals from https://www.ourcommons.ca/Parliamentarians/en/members?view=List. Once I have the list I go through each members link and try to find their email address.
Some of the members don't have email as a result of which the code is failing. I tried adding code where result of match is none and i get duplicate results in that case.
I am using the following logic for matching
mat = re.search(r'mailto:\w*\.\w*@parl.gc.ca',ln1.get('href'))
if mat:
email.append(mat.group())
else:
email.append("No Email Found")
the if condition is where the issue. when i use the else it give "No Email Found" for every row once.
weblinks=[]
email=[]
page = requests.get('https://www.ourcommons.ca/Parliamentarians/en/members?view=ListAll')
soup = BeautifulSoup(page.content, 'lxml')
for ln in soup.select(".personName > a"):
weblinks.append("https://www.ourcommons.ca" + ln.get('href'))
if(len(weblinks)==10):
break
extracts emails
for elnk in weblinks:
pagedet = requests.get(elnk)
soupdet = BeautifulSoup(pagedet.content, 'lxml')
for ln1 in soupdet.select(".caucus > a"):
mat = re.search(r'mailto:\w*\.\w*@parl.gc.ca',ln1.get('href'))
if mat:
email.append(mat.group())
else:
email.append("No Email Found")
print("Len Email:",len(email))
Expected result: show email for the page which has one and a blank for the page which doesn't have.