0

I have a DataFrame that looks like this

In [10]: vlr
Out[10]: 
               data tipoOP   qtd  ...       flxCx  flxCx_USD pmAtual_USD
ticker                            ...                                   
JBSS3.SA 2020-03-06      C   4.0  ...  -86.390000        NaN         NaN
WEGE3.SA 2020-08-27      C   3.0  ... -198.050000        NaN         NaN
ITUB4.SA 2020-09-09      C   8.0  ... -195.180000        NaN         NaN
WEGE3.SA 2020-09-09      V  -3.0  ...  193.150000        NaN         NaN
ITSA4.SA 2020-09-18      C  33.0  ... -297.430000        NaN         NaN
            ...    ...   ...  ...         ...        ...         ...
ARZZ3.SA 2021-06-01      C   2.0  ... -174.610000        NaN         NaN
CP       2021-06-02      C   1.0  ... -417.103785     -81.55       81.55
HSY      2021-06-02      C   1.0  ... -884.843100    -173.00      173.00
ARKQ     2021-06-02      V  -2.0  ...  836.650400     163.60         NaN
JPM      2021-06-02      C   1.0  ... -853.745724    -166.92      166.92

Then I'm doing:

vlr_full = vlr.groupby('data')
vlr_full = vlr_full.apply(holding.ord_ativo).drop(columns=['data'])
vlr_full = vlr_full['qtdAtual']

To get vlr_full (qtdAtual column is the number of assets owned that day, cummulative):

In [12]: vlr_full
Out[12]: 
data        ticker  
2020-03-06  JBSS3.SA     4.0
2020-08-27  WEGE3.SA     3.0
2020-09-09  ITUB4.SA     8.0
            WEGE3.SA     0.0
2020-09-18  ITSA4.SA    33.0

2021-06-01  PRIO3.SA    50.0
2021-06-02  ARKQ         0.0
            CP           1.0
            HSY          1.0
            JPM          1.0
Name: qtdAtual, Length: 125, dtype: float64

(qtdAtual means "owned shares" and its given by the cummulative sum by ticker of "qtd" column in vlr)

I have a DatetimeIndex like this:

idx = pd.date_range(first_day,end=yesterday,freq='D')

It gives me the first day I ever bought something and goes until yesterday

My objective is to fill up the vlr_full 'data' level index with missing dates, and for each date have a index of all tickers present at vlr. If the ticker wasn't traded at a given date, I'll keep the previous value (or 0 if it wasn't traded yet). With that, I was planning on using the yfinance module to get historical quote data and insert that (multiplied to the quantity I own at that date) as a new column. After that I'll plot stuff.

I tried stuff like:

vlr_full.reindex(idx)

But, obviously it didn't work (ValueError: cannot handle a non-unique multi-index!) . I'm not sure on how to proceed, I tried some related questions but I wasn't able to apply them to my issue. I'm sure there's some one line code that solves this but I'm learning as I go.

As for the output I'm looking for, it would be something like:

In [12]: vlr_full
Out[12]: 
data        ticker  
2020-03-06  ABEV3.SA     0.0
            AMD          0.0
            ARKQ         0.0
            ARRZ3.SA     0.0
            CP           0.0
            HSY          0.0
            ITUB4.SA     0.0
            ITSA4.SA     0.0
            JBSS3.SA     4.0
...         ...          ...
2020-03-07  ABEV3.SA     0.0
            AMD          0.0
            ARKQ         0.0
            ARRZ3.SA     0.0
            CP           0.0
            HSY          0.0
            ITUB4.SA     0.0
            ITSA4.SA     0.0
            JBSS3.SA     4.0
...           ...        ...
...           ...        ...
2021-06-17  ABEV3.SA     45
            AMD          5
            ARKQ         0
            ARRZ3.SA     12
            CP           1
            HSY          1
            ITUB4.SA     0.0
            ITSA4.SA     139
            JBSS3.SA     0.0

So, for each date, starting on 2020-03-06 (first date) until yesterday (2021-06-17 as of today), I want to have an index level that goes through all tickers I have traded at some point and place the owned shares at that date (zero if not previously traded). My problem is filling the missing dates on the first level, and then fillings the ticker symbols on the second level.

Note that in the output example I have only 4 shares of JBSS3.SA at 2020-03-06 and nothing more and this will be true until 2020-08-27, while still attributing all other tickers with 0 shares.

EDIT1:

Tried this:

vlr_full.reindex(pd.MultiIndex.from_product([idx,vlr_full.index.unique(level=1)], names=['data', 'ticker']), fill_value=0)

but it gives me:

ValueError: cannot handle a non-unique multi-index!

idx is:

In [29]:idx
Out[29]: 
DatetimeIndex(['2020-03-06', '2020-03-07', '2020-03-08', '2020-03-09',
               '2020-03-10', '2020-03-11', '2020-03-12', '2020-03-13',
               '2020-03-14', '2020-03-15',
               ...
               '2021-06-08', '2021-06-09', '2021-06-10', '2021-06-11',
               '2021-06-12', '2021-06-13', '2021-06-14', '2021-06-15',
               '2021-06-16', '2021-06-17'],
              dtype='datetime64[ns]', length=469, freq='D')

and vlr_full.index.unique(level=1) is:

In [30]:vlr_full.index.unique(level=1)
Out[30]: 
Index(['JBSS3.SA', 'WEGE3.SA', 'ITUB4.SA', 'ITSA4.SA', 'CPFE3.SA', 'TAEE11.SA',
       'VIVT4.SA', 'OIBR3.SA', 'TAEE4.SA', 'EGIE3.SA', 'SAPR11.SA', 'TIMS3.SA',
       'EQTL3.SA', 'VIVT3.SA', 'ABEV3.SA', 'HGLG11.SA', 'LEVE3.SA', 'RAIL3.SA',
       'XPLG11.SA', 'TAEE3.SA', 'ARKQ', 'ARZZ3.SA', 'GME', 'AMD', 'DIS',
       'EZTC3.SA', 'FLRY3.SA', 'RIO', 'FTM', 'JALL3.SA', 'PRIO3.SA',
       'BLAU3.SA', 'CP', 'HSY', 'JPM'],
      dtype='object', name='ticker')
Simon
  • 13
  • 3
  • I can not understand what do you want to do. maybe you can give some output you expect. And you may want to use `vlr_full.reset_index()`? (I quess) – SCKU Jun 18 '21 at 15:07
  • I edited with some output example of what I want. I tried to keep if small because it'll have all dates until yesterday. I hope it became more clear. – Simon Jun 18 '21 at 15:35
  • Thanks, I got it (maybe...), try yo making a solution now... – SCKU Jun 18 '21 at 16:39

2 Answers2

2

Starting from vlr_full that looks like this:

vlr_full
                   qtdActual
data       ticker           
2021-01-02 STACK         3.0
           OVER          2.0
2021-01-04 OVER          5.0
           FLOW          4.0
2021-01-06 STACK         6.0

Do the following.

# Change this to your own idx
idx = pd.date_range('2021-01-01', '2021-01-07', freq='D')
(
    vlr_full
    .unstack('ticker')
    .reindex(idx)
    .ffill()
    .fillna(0.0)
    .stack('ticker')
)

which outputs:

                   qtdActual
           ticker           
2021-01-01 FLOW          0.0
           OVER          0.0
           STACK         0.0
2021-01-02 FLOW          0.0
           OVER          2.0
           STACK         3.0
2021-01-03 FLOW          0.0
           OVER          2.0
           STACK         3.0
2021-01-04 FLOW          4.0
           OVER          5.0
           STACK         3.0
2021-01-05 FLOW          4.0
           OVER          5.0
           STACK         3.0
2021-01-06 FLOW          4.0
           OVER          5.0
           STACK         6.0
2021-01-07 FLOW          4.0
           OVER          5.0
           STACK         6.0
Jun
  • 432
  • 4
  • 8
0

Have a look at here, use .reindex with MultiIndex.

Something like:

vlr_full.reindex( pd.MultiIndex.from_product([idx, 
"your_ticker_set"], names=['data', 'ticker']), fill_value=0)

get your_ticker_set by MultiIndex.unique(level=1)

SCKU
  • 783
  • 9
  • 14
  • Didn't work, unfortunately. Raises "ValueError: cannot handle a non-unique multi-index!". I'll keep trying to tweak with it. – Simon Jun 18 '21 at 17:08
  • @RatoF your `data` index (e.g. date) has duplicated date?? – SCKU Jun 18 '21 at 17:12
  • There are days when more than one asset (ticker) was traded. It can be seen in the vlr dataframe (see '2021-06-02'). But then I do vlr_full = vlr.groupby('data'). I though this was supposed to leave unique dates on level=0? Is this where I'm wrong? – Simon Jun 18 '21 at 17:17