I am learning python, I am trying to scrape a table from https://www.zaubacorp.com/company-list/city-DELHI/status-Active/p-1-company.html website. In this table you can see there are 4 columns "CIN", Company Name", "Roc" and "Status". As you can see "Company Name" is a hyperlink, I need 5 columns "CIN", "Company Name", "Company Link", "Roc" and "Status". for the same I wrote a code, but I got only 4 columns and instead of "Company Link" I got different result. I am sharing the screen shot of my output csv file.
Please help me to scraping this table in 5 columns "CIN", "Company Name", "Company Link", "Roc" and "Status". here is my code and please find the image of my output csv file.
import csv
from bs4 import BeautifulSoup
import re
import html5lib
def find_between(s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
loop = 1
while(True):
try:
URL = "https://www.zaubacorp.com/company-list/city-DELHI/status-Active/p-" + str(loop) + "-company.html"
loop=loop+1
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
tbody = soup.find('tbody')
rows = tbody.find_all('tr')
row_list = list()
for tr in rows:
row=[]
td = tr.find_all('td')
for a in td:
href=a.find('a',href=True)
if href==None:
row.append(a.text.strip())
print(a.text.strip())
else:
linktext = href.__getitem__
row.append(linktext)
row_list.append(row)
with open('zaubadata.csv', 'a') as csvFile:
writer = csv.writer(csvFile)
for r in row_list:
writer.writerow(r)
except Exception as obj:
print(obj)
csvFile.close()
break
[![result of above code in 4 columns][1]][1]
[1]: https://i.stack.imgur.com/oUVLK.png