0

I have two Pandas dataframes, df_stock_prices and df_sentiment_mean.

I would like to do the following:

  1. Left join/merge these two dataframes into one dataframe, joined by Date and by ticker. In df_stock_prices, ticker is the column name, for example AAPL.OQ and in df_sentiment_mean ticker is found within the rows of the column named ticker.

  2. If there is a Date and ticker from df_stock_prices that doesn't match df_sentiment_mean, keep the non-matching row of df_stock_prices as-is (hence the left join).

  3. When there is a match for both Date and ticker, multiply the fields together; for example in the dataframes listed below, if df_stock_prices Date is 2021-11-29 and column AAPL.OQ is a match for the df_sentiment_mean Date of 2021-11-29 and ticker AAPL.OQ, then multiply the values for the match, in this example: 160.24 * 0.163266.

If a Date and ticker from df_stock_prices doesn't match a Date and ticker value from df_sentiment_mean, keep the values from df_stock_prices.

Current dataframes:

df_stock_prices:


            AAPL.OQ  ABBV.N   ABT.N   ACN.N  ADBE.OQ  AIG.N  AMD.OQ  AMGN.OQ  
Date                                                                           
2021-11-29   160.24  116.89  128.03  365.82   687.49  54.95  161.91   203.47   
2021-11-30   165.30  115.28  125.77  357.40   669.85  52.60  158.37   198.88   
2021-12-01   164.77  115.91  126.74  360.14   657.41  51.72  149.11   200.80   
2021-12-02   163.76  116.87  128.38  365.30   671.88  53.96  150.68   201.17   
2021-12-03   161.84  118.85  130.27  361.42   616.53  53.32  144.01   202.44   
...

df_sentiment_mean:

            ticker      diff
Date                         
2021-11-29  AAPL.OQ  0.163266
2021-11-29   ABBV.N -0.165520
2021-11-29    ABT.N  0.149920
2021-11-29  ADBE.OQ -0.014639
2021-11-29    AIG.N -0.448595
...             ...       ...
2023-01-12    LOW.N  0.008863
2023-01-12    MDT.N  0.498884
2023-01-12     MO.N -0.013428
2023-01-12    NEE.N  0.255223
2023-01-12    NKE.N  0.072752

Desired dataframe, partial first row example:

df_new:

            AAPL.OQ         ABBV.N      ABT.N        ACN.N     ADBE.OQ       AIG.N  …
Date
2021-11-29  26.16174384    -19.3476328  19.1942576  365.82    -10.06416611  -24.65029525  …
...
billv1179
  • 323
  • 5
  • 15

1 Answers1

1

You have to reshape your second dataframe then multiply both dataframes. Finally, fill nan by original values of your first dataframe:

# df1 = df_stock_prices
# df2 = df_sentiment_mean
df_new = df1.mul(df2.set_index('ticker', append=True)['diff'].unstack('ticker')).fillna(df1)
>>> df1
            AAPL.OQ  ABBV.N   ABT.N   ACN.N  ADBE.OQ  AIG.N
Date                                                       
2021-11-29   160.24  116.89  128.03  365.82   687.49  54.95


>>> df2
             ticker      diff
Date                         
2021-11-29  AAPL.OQ  0.163266
2021-11-29   ABBV.N -0.165520
2021-11-29    ABT.N  0.149920
2021-11-29  ADBE.OQ -0.014639
2021-11-29    AIG.N -0.448595


>>> df_new
              AAPL.OQ     ABBV.N      ABT.N   ACN.N    ADBE.OQ      AIG.N
Date                                                                     
2021-11-29  26.161744 -19.347633  19.194258  365.82 -10.064166 -24.650295
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • So does an answer solve your problem? – Corralien Jan 25 '23 at 17:20
  • This came very close. It's giving one mul calculation by row. For example, AAPL.OQ is in row 1, ABBV.N is in row 2, etc. Trying to figure out how to get the calculations together in each row; do you have any ideas? Also, any way to bring the date back? – billv1179 Jan 25 '23 at 17:57
  • AAPL.OQ ABBV.N ABT.N ACN.N ADBE.OQ AIG.N AMD.OQ \ 0 26.161677 116.89 128.03 365.82 687.49 54.95 161.91 1 165.3 -19.081192 125.77 357.4 669.85 52.6 158.37 2 164.77 115.91 19.000849 360.14 657.41 51.72 149.11 3 163.76 116.87 128.38 365.3 -9.835502 53.96 150.68 4 161.84 118.85 130.27 361.42 616.53 -23.919105 144.01 – billv1179 Jan 25 '23 at 17:59
  • The code above multiply element-wise for each (Date, Ticker). What do you mean by bring the date back? – Corralien Jan 25 '23 at 18:02
  • To keep date in-line with the original prices or calculations. So for example, date would remain in the dataframe (like an index). ABBV.N would have a calculation and remain in-line with ABBV.N's calculation. >> df_new AAPL.OQ ABBV.N ABT.N ACN.N ADBE.OQ AIG.N Date 2021-11-29 26.161744 -19.347633 19.194258 365.82 -10.064166 -24.650295 2021-11-30 .... – billv1179 Jan 25 '23 at 18:09
  • Are you looking for `df_new.reset_index()`? – Corralien Jan 25 '23 at 18:10
  • Hi Corralien, yes, that's definitely the case. I've attempted to do this as is, "df_new.reset_index()" as well as "df_new = df1.mul(df2.set_index('ticker', append=True)['diff'].unstack('ticker')).fillna(df1).reset_index()" but it doesn't seem to retain the dates. – billv1179 Jan 25 '23 at 18:51
  • Is it possible for you to share the data? – Corralien Jan 25 '23 at 18:53