I have downloaded one year worth of S&P 500 stock data using the Python package yfinance
as follows (making sure to only keep the opening price) for each of the 500 firms:
import pandas as pd
import yfinance as yf
import numpy as np
source=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = pd.DataFrame(source[0])
tickers_symbols=df['Symbol'].values.tolist()
GICS_sectors = df['GICS Sector'].values.tolist()
data = pd.DataFrame()
for t,s in zip(tickers_symbols, GICS_sectors):
tmp = yf.download(t, period='1y', progress=False)
tmp.reset_index(inplace=True)
tmp['Ticker'] = t
tmp['GICS'] = s
data = data.append(tmp, ignore_index=True)
##KEEP ONLY OPENING PRICE##
data=data.drop(["Close", "High", "Low", "Adj Close", "Volume"], axis=1)
Now, I need to sort this large dataaset into smaller datasets according to each company's GICS sector. In order to do this, I included the tuple GICS
and dataset
in a dict
object (as suggested here, such that I would then be able to call each smaller dataset by simply typing dataset_list['GICS sector']
.
dataset_list = dict(tuple(data.groupby('GICS')))
print(dataset_list)
##SPLIT DATASET BY GICS SECTOR AND REMOVE GICS COLUMN##
for sector, dataset in dataset_list.items():
long_dataset=data.drop(columns='GICS', axis=1)
However, I am having trouble with the subsequent steps. Indeed, as I run a loop to transform each dataset from long to wide and save it as a .csv file, it correctly creates 11 files (as many as there should be datasets), but the data in each file is exactly the same.
##CONVERT EACH DATASET FROM LONG TO WIDE##
for sector, dataset in dataset_list.items():
final_datasets=long_dataset.pivot_table(index="Date", columns="Ticker", values="Open")
final_datasets.to_csv(str(sector)+' DataFrame.csv', index=True, sep=',')
I think there is a problem with the loop I wrote, but I am not sure how to fix it. Each loop above should modify all datasets in the dataset_list
object, i.e. I should be able to retrieve a dataframe final_datasets['GICS sector']
, but only one dataframe is produced.
Any help is much appreciated.