Webscraping NSE Options prices using Python BeautifulSoup, regarding encoding correction

Question

Dec 2020 update:

I have:

Achieved full automation, minute level data collection for entire FnO universe.
Auto adapts to changing FnO universe, exits and new entries.
Shuts down in non-market hours.
Shuts down on holidays, including newly declared holidays.
Starts automatically during yearly Muhurat Trading data.

I am a bit new to web scraping and not used to 'tr' & 'td' stuff and thus this doubt. I am trying to replicate this Python 2.7 code in my Python 3 from this thread 'https://www.quantinsti.com/blog/option-chain-extraction-for-nse-stocks-using-python'.

This old code uses .ix for indexing which I can correct using .iloc easily. However, the line <tr = tr.replace(',' , '')> show up error 'a bytes-like object is required, not 'str'' even if I write it before <tr = utf_string.encode('utf8')>.

I have checked this other link from stackoverflow and couldn't solve my problem

I think I have spotted why this is happening. It's because of the previous for loop used previously to define variable tr. If I omit this line, then I get a DataFrame with the numbers with some attached text. I can filter this with a loop over the entire DataFrame, but a better way must be by properly using the replace() function. I can't figure this bit out.

Here is my full code. I have marked the critical sections of the code I have referred using ######################### exclusively in a line so that the line can be found out quickly (even by Ctrl + F):

import requests
import pandas as pd
from bs4 import BeautifulSoup

Base_url = ("https://nseindia.com/live_market/dynaContent/"+
        "live_watch/option_chain/optionKeys.jsp?symbolCode=2772&symbol=UBL&"+
        "symbol=UBL&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17")

page = requests.get(Base_url)
#page.status_code
#page.content

soup = BeautifulSoup(page.content, 'html.parser')
#print(soup.prettify())

table_it = soup.find_all(class_="opttbldata")
table_cls_1 = soup.find_all(id = "octable")

col_list = []

# Pulling heading out of the Option Chain Table

#########################
for mytable in table_cls_1:
    table_head = mytable.find('thead')

    try:
        rows = table_head.find_all('tr')
        for tr in rows:
            cols = tr.find_all('th')
            for th in cols:
                er = th.text
                #########################
                ee = er.encode('utf8')
                col_list.append(ee)
    except:
        print('no thread')

col_list_fnl = [e for e in col_list if e not in ('CALLS', 'PUTS', 'Chart', '\xc2\xa0')]
#print(col_list_fnl)

table_cls_2 = soup.find(id = "octable")
all_trs = table_cls_2.find_all('tr')
req_row = table_cls_2.find_all('tr')

new_table = pd.DataFrame(index=range(0,len(req_row)-3),columns = col_list_fnl)

row_marker = 0

for row_number, tr_nos in enumerate(req_row):
    if row_number <= 1 or row_number == len(req_row)-1:
        continue # To insure we only choose non empty rows
    
    td_columns = tr_nos.find_all('td')

    # Removing the graph column
    select_cols = td_columns[1:22]
    cols_horizontal = range(0,len(select_cols))

    for nu, column in enumerate(select_cols):
    
        utf_string = column.get_text()
        utf_string = utf_string.strip('\n\r\t": ')
        #########################
        tr = tr.replace(',' , '') # Commenting this out makes code partially work, getting numbers + text attached to the numbers in the table

        # That is obtained by commenting out the above line with tr variable & running the entire code.
        tr = utf_string.encode('utf8')
    
        new_table.iloc[row_marker,[nu]] = tr
            
    row_marker += 1

print(new_table)

Rather than using your current question title, please change it to something that summarizes your question/problem — Andreas, Jan 02 '19 at 12:27
Good idea, unfortunately, I couldn't find a place to edit my question. — N V, Jan 03 '19 at 05:05
@NV, Can you please update if you have got the solution for your question? Also give a +1 to the answer you find correct. I am also looking at scraping NSE Option chain using python and need to know if this worked. — KeyurM, Mar 04 '19 at 14:54
@NV is this working as on 30JAN21 ,i have same issues and all my data return are none type ,Could you post or share link the complete working code . Thanks for your details here.Very nice post .We can connect if you wish to discuss more on this . — Marx Babu, Jan 30 '21 at 03:01
@MarxBabu - I made this kind of thing work in another programming language — N V, Apr 21 '21 at 14:09

B.Adler · Accepted Answer · 2019-01-02T19:26:42.050

For the first section:

er = th.text should be er = th.get_text()

Link to get_text documentation

For the latter section:

Looking at it, your "tr" variable at this point is the last tr tag found in the soup using for tr in rows. This means the tr you are trying to call replace on is a navigable string, not a string.

tr = tr.get_text().replace(',' , '') should work for the first iteration, however as you have overwritten it in the first iteration it will break in the next iteration.

Additionally, thank you for the depth of your question. While you did not pose it as a question, the length you went to describe the trouble you are having as well as the code you have tried is greatly appreciated.

score 0 · Answer 2 · answered Jan 04 '19 at 13:20

If you replace the below lines of codes

tr = tr.replace(',' , '')
tr = utf_string.encode('utf8')
new_table.iloc[row_marker,[nu]] = tr

with the following code then it should work.

new_table.iloc[row_marker,[nu]] = utf_string.replace(',' , '')

As the replace function doesn't work with the Unicode. You can also consider using below code to decode the column names

col_list_fnl = [e.decode('utf8') for e in col_list if e not in ('CALLS', 'PUTS', 'Chart', '\xc2\xa0')]
col_list_fnl

I hope this helps.

Webscraping NSE Options prices using Python BeautifulSoup, regarding encoding correction

Dec 2020 update:

2 Answers2

Linked