I have a DataFrame that looks like this
In [10]: vlr
Out[10]:
data tipoOP qtd ... flxCx flxCx_USD pmAtual_USD
ticker ...
JBSS3.SA 2020-03-06 C 4.0 ... -86.390000 NaN NaN
WEGE3.SA 2020-08-27 C 3.0 ... -198.050000 NaN NaN
ITUB4.SA 2020-09-09 C 8.0 ... -195.180000 NaN NaN
WEGE3.SA 2020-09-09 V -3.0 ... 193.150000 NaN NaN
ITSA4.SA 2020-09-18 C 33.0 ... -297.430000 NaN NaN
... ... ... ... ... ... ...
ARZZ3.SA 2021-06-01 C 2.0 ... -174.610000 NaN NaN
CP 2021-06-02 C 1.0 ... -417.103785 -81.55 81.55
HSY 2021-06-02 C 1.0 ... -884.843100 -173.00 173.00
ARKQ 2021-06-02 V -2.0 ... 836.650400 163.60 NaN
JPM 2021-06-02 C 1.0 ... -853.745724 -166.92 166.92
Then I'm doing:
vlr_full = vlr.groupby('data')
vlr_full = vlr_full.apply(holding.ord_ativo).drop(columns=['data'])
vlr_full = vlr_full['qtdAtual']
To get vlr_full
(qtdAtual
column is the number of assets owned that day, cummulative):
In [12]: vlr_full
Out[12]:
data ticker
2020-03-06 JBSS3.SA 4.0
2020-08-27 WEGE3.SA 3.0
2020-09-09 ITUB4.SA 8.0
WEGE3.SA 0.0
2020-09-18 ITSA4.SA 33.0
2021-06-01 PRIO3.SA 50.0
2021-06-02 ARKQ 0.0
CP 1.0
HSY 1.0
JPM 1.0
Name: qtdAtual, Length: 125, dtype: float64
(qtdAtual means "owned shares" and its given by the cummulative sum by ticker of "qtd" column in vlr)
I have a DatetimeIndex
like this:
idx = pd.date_range(first_day,end=yesterday,freq='D')
It gives me the first day I ever bought something and goes until yesterday
My objective is to fill up the vlr_full
'data' level index with missing dates, and for each date have a index of all tickers present at vlr
. If the ticker wasn't traded at a given date, I'll keep the previous value (or 0 if it wasn't traded yet). With that, I was planning on using the yfinance
module to get historical quote data and insert that (multiplied to the quantity I own at that date) as a new column. After that I'll plot stuff.
I tried stuff like:
vlr_full.reindex(idx)
But, obviously it didn't work (ValueError: cannot handle a non-unique multi-index!) . I'm not sure on how to proceed, I tried some related questions but I wasn't able to apply them to my issue. I'm sure there's some one line code that solves this but I'm learning as I go.
As for the output I'm looking for, it would be something like:
In [12]: vlr_full
Out[12]:
data ticker
2020-03-06 ABEV3.SA 0.0
AMD 0.0
ARKQ 0.0
ARRZ3.SA 0.0
CP 0.0
HSY 0.0
ITUB4.SA 0.0
ITSA4.SA 0.0
JBSS3.SA 4.0
... ... ...
2020-03-07 ABEV3.SA 0.0
AMD 0.0
ARKQ 0.0
ARRZ3.SA 0.0
CP 0.0
HSY 0.0
ITUB4.SA 0.0
ITSA4.SA 0.0
JBSS3.SA 4.0
... ... ...
... ... ...
2021-06-17 ABEV3.SA 45
AMD 5
ARKQ 0
ARRZ3.SA 12
CP 1
HSY 1
ITUB4.SA 0.0
ITSA4.SA 139
JBSS3.SA 0.0
So, for each date, starting on 2020-03-06 (first date) until yesterday (2021-06-17 as of today), I want to have an index level that goes through all tickers I have traded at some point and place the owned shares at that date (zero if not previously traded). My problem is filling the missing dates on the first level, and then fillings the ticker symbols on the second level.
Note that in the output example I have only 4 shares of JBSS3.SA at 2020-03-06 and nothing more and this will be true until 2020-08-27, while still attributing all other tickers with 0 shares.
EDIT1:
Tried this:
vlr_full.reindex(pd.MultiIndex.from_product([idx,vlr_full.index.unique(level=1)], names=['data', 'ticker']), fill_value=0)
but it gives me:
ValueError: cannot handle a non-unique multi-index!
idx
is:
In [29]:idx
Out[29]:
DatetimeIndex(['2020-03-06', '2020-03-07', '2020-03-08', '2020-03-09',
'2020-03-10', '2020-03-11', '2020-03-12', '2020-03-13',
'2020-03-14', '2020-03-15',
...
'2021-06-08', '2021-06-09', '2021-06-10', '2021-06-11',
'2021-06-12', '2021-06-13', '2021-06-14', '2021-06-15',
'2021-06-16', '2021-06-17'],
dtype='datetime64[ns]', length=469, freq='D')
and vlr_full.index.unique(level=1)
is:
In [30]:vlr_full.index.unique(level=1)
Out[30]:
Index(['JBSS3.SA', 'WEGE3.SA', 'ITUB4.SA', 'ITSA4.SA', 'CPFE3.SA', 'TAEE11.SA',
'VIVT4.SA', 'OIBR3.SA', 'TAEE4.SA', 'EGIE3.SA', 'SAPR11.SA', 'TIMS3.SA',
'EQTL3.SA', 'VIVT3.SA', 'ABEV3.SA', 'HGLG11.SA', 'LEVE3.SA', 'RAIL3.SA',
'XPLG11.SA', 'TAEE3.SA', 'ARKQ', 'ARZZ3.SA', 'GME', 'AMD', 'DIS',
'EZTC3.SA', 'FLRY3.SA', 'RIO', 'FTM', 'JALL3.SA', 'PRIO3.SA',
'BLAU3.SA', 'CP', 'HSY', 'JPM'],
dtype='object', name='ticker')