I am trying to extract the values of SMILES String and Repeat_Unit from the table in the following webpage: https://khazana.gatech.edu/module_search/material_detail.php?id=1&m=9
although this might not be the most efficient way, I can successfully extract those values from the following code:
from bs4 import BeautifulSoup
import requests
link='https://khazana.gatech.edu/module_search/material_detail.php?id=1&m=9'
link=requests.get(link)
soup=BeautifulSoup(link.text)
data=[]
tables=soup.find_all('table')
#the desired table was selected based on list index because there is no other attributes
table_body=tables[9].find('tbody')
rows=table_body.findAll('tr')
for row in rows:
cols=row.findAll('td')
cols=[ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
print (data[13][1])
print (data[14][1])
In my application, I need to extract the values of SMILES String and Repeat_Unit from 1000s of the similar web page where the html addresses differ only in the number that appears after id= which in this example it is 1.
I have pandas dataframe where one columns has id of the data. in order to get the SMILES String and Repeat Unit for a given id, I modified the above code to:
data=[]
SMILES=[]
Repeat_Unit=[]
for index, prow in df.iterrows():
a=prow['#id']
link='https://khazana.gatech.edu/module_search/material_detail.php?id='+str(a)+'&m=9'
link=requests.get(link)
soup=BeautifulSoup(link.text)
tables=soup.find_all('table')
for table in tables:
table_body=tables[9].find('tbody')
rows=table_body.findAll('tr')
for row in rows:
cols=row.findAll('td')
cols=[ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
SMILES.append(data[13][1])
Repeat_Unit.append(data[14][1])
now when I call SMILES or RepeatUnit, i get the following error:
IndexError Traceback (most recent call last)
<ipython-input-55-74f7ef016c59> in <module>()
36 cols=[ele.text.strip() for ele in cols]
37 data.append([ele for ele in cols if ele])
---> 38 SMILES.append(data[13][1])
39 Repeat_Unit.append(data[14][1])
IndexError: list index out of range
even if I loop through the data before appending to SMILES, I still get the same error.
Thank you in advance for your help!