I want to extract all option chain data from yahoo finance webpage,take put option chain data for simplicity. At first ,load all packages used in the program:
import time
import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
The function to write some company's put option chain data into a directory:
def write_option_chain(code):
browser = webdriver.Chrome()
browser.maximize_window()
url = "https://finance.yahoo.com/quote/{}/options?p={}".format(code,code)
browser.get(url)
WebDriverWait(browser,10).until(EC.visibility_of_element_located((By.XPATH, ".//select/option")))
time.sleep(25)
date_elem = browser.find_elements_by_xpath(".//select/option")
time_span = len(date_elem)
print('{} option chains exists in {}'.format(time_span,code))
df_all = pd.DataFrame()
for item in range(1,time_span):
element_date = browser.find_element_by_xpath('.//select/option[{}]'.format(item))
print("parsing {}'s put option chain on {} now".format(code,element_date.text))
element_date.click()
WebDriverWait(browser,10).until(EC.visibility_of_all_elements_located((By.XPATH, ".//table[@class='puts W(100%) Pos(r) list-options']//td")))
time.sleep(11)
put_table = browser.find_element_by_xpath((".//table[@class='puts W(100%) Pos(r) list-options']"))
put_table_string = put_table.get_attribute('outerHTML')
df_put = pd.read_html(put_table_string)[0]
df_all = df_all.append(df_put)
browser.close()
browser.quit()
df_all.to_csv('/tmp/{}.csv'.format(code))
print('{} otpion chain written into csv file'.format(code))
To test the write_option_chain
with a list:
nas_list = ['aapl','adbe','adi','adp','adsk']
for item in nas_list:
try:
write_option_chain(code=item)
except:
print("check what happens to {} ".format(item))
continue
time.sleep(5)
The output info shows:
#i omitted many lines for simplicity
18 option chains exists in aapl
parsing aapl's put option chain on August 27, 2021 now
check what happens to aapl
check what happens to adbe
12 option chains exists in adi
parsing adi's put option chain on December 17, 2021 now
adi otpion chain written into csv file
11 option chains exists in adp
parsing adp's put option chain on August 27, 2021 now
adp otpion chain written into csv file
check what happens to adsk
We make a summary from above info:
1.only adp
and adi
's put option chain data written into desired directory.
2.get only part of aapl
and adp
's option chain data
3.can't open adsk's option webpage.
4.it takes almost 20 minutes to execute.
How to make the data extraction from webpage with selenium more robust and efficient?