Dec 2020 update:
I have:
- Achieved full automation, minute level data collection for entire FnO universe.
- Auto adapts to changing FnO universe, exits and new entries.
- Shuts down in non-market hours.
- Shuts down on holidays, including newly declared holidays.
- Starts automatically during yearly Muhurat Trading data.
I am a bit new to web scraping and not used to 'tr' & 'td' stuff and thus this doubt. I am trying to replicate this Python 2.7 code in my Python 3 from this thread 'https://www.quantinsti.com/blog/option-chain-extraction-for-nse-stocks-using-python'.
This old code uses .ix for indexing which I can correct using .iloc easily. However, the line <tr = tr.replace(',' , '')> show up error 'a bytes-like object is required, not 'str'' even if I write it before <tr = utf_string.encode('utf8')>.
I have checked this other link from stackoverflow and couldn't solve my problem
I think I have spotted why this is happening. It's because of the previous for loop used previously to define variable tr. If I omit this line, then I get a DataFrame with the numbers with some attached text. I can filter this with a loop over the entire DataFrame, but a better way must be by properly using the replace() function. I can't figure this bit out.
Here is my full code. I have marked the critical sections of the code I have referred using ######################### exclusively in a line so that the line can be found out quickly (even by Ctrl + F):
import requests
import pandas as pd
from bs4 import BeautifulSoup
Base_url = ("https://nseindia.com/live_market/dynaContent/"+
"live_watch/option_chain/optionKeys.jsp?symbolCode=2772&symbol=UBL&"+
"symbol=UBL&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17")
page = requests.get(Base_url)
#page.status_code
#page.content
soup = BeautifulSoup(page.content, 'html.parser')
#print(soup.prettify())
table_it = soup.find_all(class_="opttbldata")
table_cls_1 = soup.find_all(id = "octable")
col_list = []
# Pulling heading out of the Option Chain Table
#########################
for mytable in table_cls_1:
table_head = mytable.find('thead')
try:
rows = table_head.find_all('tr')
for tr in rows:
cols = tr.find_all('th')
for th in cols:
er = th.text
#########################
ee = er.encode('utf8')
col_list.append(ee)
except:
print('no thread')
col_list_fnl = [e for e in col_list if e not in ('CALLS', 'PUTS', 'Chart', '\xc2\xa0')]
#print(col_list_fnl)
table_cls_2 = soup.find(id = "octable")
all_trs = table_cls_2.find_all('tr')
req_row = table_cls_2.find_all('tr')
new_table = pd.DataFrame(index=range(0,len(req_row)-3),columns = col_list_fnl)
row_marker = 0
for row_number, tr_nos in enumerate(req_row):
if row_number <= 1 or row_number == len(req_row)-1:
continue # To insure we only choose non empty rows
td_columns = tr_nos.find_all('td')
# Removing the graph column
select_cols = td_columns[1:22]
cols_horizontal = range(0,len(select_cols))
for nu, column in enumerate(select_cols):
utf_string = column.get_text()
utf_string = utf_string.strip('\n\r\t": ')
#########################
tr = tr.replace(',' , '') # Commenting this out makes code partially work, getting numbers + text attached to the numbers in the table
# That is obtained by commenting out the above line with tr variable & running the entire code.
tr = utf_string.encode('utf8')
new_table.iloc[row_marker,[nu]] = tr
row_marker += 1
print(new_table)