3

I'm new to Python and am working to extract data from website https://www.screener.in/company/ABB/consolidated/ on a particular table (the last table which is Shareholding Pattern)

I'm using BeautifulSoup library for this but I do not know how to go about it.

So far, here below is my code snippet. am failing to pick the right table due to the fact that the page has multiple tables and all tables share common classes and IDs which makes it difficult for me to filter for the one table I want.

import requests import urllib.request
from bs4 import BeautifulSoup
    
url = "https://www.screener.in/company/ABB/consolidated/"

r = requests.get(url)
print(r.status_code)
html_content = r.text
soup = BeautifulSoup(html_content,"html.parser")
# print(soup)
#data_table = soup.find('table', class_ = "data-table")
# print(data_table) table_needed = soup.find("<h2>ShareholdingPattern</h2>")
#sub = table_needed.contents[0] print(table_needed)
baduker
  • 19,152
  • 9
  • 33
  • 56
Manny
  • 41
  • 5

1 Answers1

4

Just use requests and pandas. Grab the last table and dump it to a .csv file.

Here's how:

import pandas as pd
import requests

df = pd.read_html(
    requests.get("https://www.screener.in/company/ABB/consolidated/").text,
    flavor="bs4",
)
df[-1].to_csv("last_table.csv", index=False)

Output from a .csv file:

enter image description here

baduker
  • 19,152
  • 9
  • 33
  • 56
  • dear baduker, thank you for the answer and the great idea to user pandas in such a great manner.! glad to see such great application of pandas here. Many thanks alot for your great work. regards – malaga Feb 09 '21 at 12:07
  • Thank you, this solution works but can you please explain how it works. I do not see anywhere specifying the table, yet it pulls the exact table I want @baduker – Manny Feb 09 '21 at 12:10
  • 1
    Since you want the last table the `df[-1]` uses `indexing` to grab the first table from the end of the array. That's your table. – baduker Feb 09 '21 at 12:12
  • @baduker I need another help. How can I select specific columns, (i.e. last 2 columns). I added index_col but it doesn't work. ***df = pd.read_html( requests.get("https://www.screener.in/company/ABB/consolidated/").text, index_col=-2, flavor="bs4", )*** – Manny Feb 09 '21 at 14:07