Please help, this poor, struggling, philosophy & economy majored person.
I'm trying to get the market cap of Samsung Electronics from Korean Website 'finance.naver.com'
(It doesn't need to be Samsung, I just need to crawl marketcap for my quant investment purpose)
the web site is https://finance.naver.com/item/main.nhn?code=005930
this is the image of the web page and the target number is in the red box
this is my code
from bs4 import BeautifulSoup
import requests
mkc_url = 'https://finance.naver.com/item/main.nhn?code=005930'
mkc_result = requests.get(mkc_url)
mkc_obj = BeautifulSoup(mkc_result.content, "html.parser")
I found the the target number is in the 'div' tag, 'first' class
mkc = mkc_obj.find("div",{"class": "first"})
in the 'div' tag, I found the number is in 'em' tag, '_market_sum' id
em_id = mkc.find("em", {"id":"_market_sum"})
finanlly i got the result like this
'조' is the measure of Korean currency so I wanted to delete everything but numbers, but I didn't know how
What I did was put that result in the DataFrame, and tried to delete everything but numbers using '.str.strip'
df_mkc = pd.DataFrame(em_id)
df_mkc[0] = df_mkc[0].str.strip('\n')
df_mkc[0] = df_mkc[0].str.strip('\t')
df_mkc[0] = df_mkc[0].str.strip()
df_mkc = df_mkc.replace({'\$': '', ',': ''}, regex=True)
and it get's ugglier and ugglier
I tapped out at this point
Please help!!!
Thanks for all your kindness, wisdon and generosity