0

I have downloaded stock data with yfinance and I'm trying to slice the DataFrame into one df for each stock, but I really don't know how to do it. The data is a df with multiIndex but the columns were set with tuples of the infos and the tickers (example below), but not with the tickers itself as I want for my data analysis. Even if i call the "df.info" function it only brings the infos columns and not the tickers. How can I slice this df to have the infos separated by the tickers? The code right know is the following:

import pandas as pd
import yfinance as yf
import pandas_datareader.data as pdr
yf.pdr_override()
tickers = ['PETR4.SA', 'PFRM3.SA', 'BIOM3.SA', 'DASA3.SA']
acoes = pdr.get_data_yahoo(tickers)
print(acoes)
print(type(acoes))

The results of the detailing of acoes is:

DatetimeIndex: 5056 entries, 2000-01-03 to 2020-04-16
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----
0   (Adj Close, BIOM3.SA)  4448 non-null   float64
1   (Adj Close, DASA3.SA)  3794 non-null   float64
2   (Adj Close, PETR4.SA)  4990 non-null   float64
3   (Adj Close, PFRM3.SA)  3316 non-null   float64
4   (Close, BIOM3.SA)      4448 non-null   float64
...

My main goal is to develop a code for searching single or multi stock infos to run some analysis. I have no experience whatsoever with coding or programming and I'm only doing this to make my life on financial market easier, haha. Thanks in advance!

Grego
  • 11
  • 2

1 Answers1

0

You can pivot columns into rows with melt.

acoes.index.name = 'date'
long_form = acoes.reset_index().melt('date', var_name=['var', 'ticker'])
long_form
#              date        var    ticker     value
# 0      2000-01-03  Adj Close  BIOM3.SA       NaN
# 1      2000-01-04  Adj Close  BIOM3.SA       NaN
# 2      2000-01-05  Adj Close  BIOM3.SA       NaN
# 3      2000-01-06  Adj Close  BIOM3.SA       NaN
# 4      2000-01-07  Adj Close  BIOM3.SA       NaN
# ...           ...        ...       ...       ...

The original data had two column levels for the column names, so those ended up in two different columns in the long form. Then you can use pivot_table to widen back out to one column per variable, while keeping ticker as a column.

long_form.pivot_table(index=['date', 'ticker'], columns='var', values='value').reset_index()
# var         date    ticker  Adj Close   Close    High     Low    Open        Volume
# 0     2000-01-03  PETR4.SA   4.050402   5.875   5.875   5.875   5.875  3.538944e+10
# 1     2000-01-04  PETR4.SA   3.826338   5.550   5.550   5.550   5.550  2.886144e+10
# 2     2000-01-05  PETR4.SA   3.787730   5.494   5.494   5.494   5.494  4.303360e+10
# 3     2000-01-06  PETR4.SA   3.774631   5.475   5.475   5.475   5.475  3.405568e+10
# 4     2000-01-07  PETR4.SA   3.791866   5.500   5.500   5.500   5.500  2.091264e+10
# ...          ...       ...        ...     ...     ...     ...     ...           ...

Finally, you can split by ticker using groupby and iterating.

for ticker, sub_df in long_form.groupby('ticker'):
    # sub_df has the data for a single ticker.
    # Do what you want with it.
mcskinner
  • 2,620
  • 1
  • 11
  • 21