2

I am a newbie to pandas and I am struggling to get the result that I want after doing research / experiencing some trial and errors...appreciate any guidance from here, thanks in advance!

Supposed I have a dataframe which holds some trade data, to make it easier to explain what I want to do, I made up these data myself:

| time       | symbol   | market  | volume |
| ---------- | -------- | ------- | ------ |
| 2022-07-01 | btcusdt  | Binance | 300    |
| 2022-07-01 | btcusdt  | Gemini  | 200    |
| 2022-07-01 | btcusdt  | Huobi   | 100    |
| 2022-07-02 | btcusdt  | Binance | 400    |
| 2022-07-02 | btcusdt  | Gemini  | 350    |
| 2022-07-02 | btcusdt  | Huobi   | 100    |
| 2022-07-01 | ethusdt  | Binance | 1000   |
| 2022-07-01 | ethusdt  | Gemini  | 2000   |
| 2022-07-02 | ethusdt  | Binance | 1500   |
| 2022-07-02 | ethusdt  | Gemini  | 500    |
| 2022-07-01 | usdcusdt | Binance | 300    |
| 2022-07-01 | usdcusdt | Gemini  | 1000   |
| 2022-07-01 | usdcusdt | Huobi   | 2000   |
| 2022-07-02 | usdcusdt | Binance | 1500   |
| 2022-07-02 | usdcusdt | Gemini  | 1500   |
| 2022-07-02 | usdcusdt | Huobi   | 1500   |

I want to create a column "prod_vol" in the original dataframe which does the following calculation:

prod_vol = volume / total_volume_trade_on_that_exchange

to get the percentage EACH DAY, example shown as below:

| time       | symbol   | market  | volume | prod_vol                            |
| ---------- | -------- | ------- | ------ | ----------------------------------- |
| 2022-07-01 | btcusdt  | Binance | 300    | = 300/(300+1000+300) * 100 = 18.75% |
| 2022-07-01 | btcusdt  | Gemini  | 200    | = 200/(200+2000+1000) * 100 = 6.25% |
| 2022-07-01 | btcusdt  | Huobi   | 100    | = 100/(100+2000) = 8.33%            |          
| 2022-07-02 | btcusdt  | Binance | 400    | = 400/(400+1500+1500) = 11.76%      |
| 2022-07-02 | btcusdt  | Gemini  | 350    |                                     |
| 2022-07-02 | btcusdt  | Huobi   | 100    |                                     |
| 2022-07-01 | ethusdt  | Binance | 1000   |                                     |
| 2022-07-01 | ethusdt  | Gemini  | 2000   |                                     |
| 2022-07-02 | ethusdt  | Binance | 1500   |                                     |
| 2022-07-02 | ethusdt  | Gemini  | 500    |                                     |
| 2022-07-01 | usdcusdt | Binance | 300    |                                     |
| 2022-07-01 | usdcusdt | Gemini  | 1000   |                                     |
| 2022-07-01 | usdcusdt | Huobi   | 2000   |                                     |
| 2022-07-02 | usdcusdt | Binance | 1500   |                                     |
| 2022-07-02 | usdcusdt | Gemini  | 1500   |                                     |
| 2022-07-02 | usdcusdt | Huobi   | 1500   |                                     |

I am able to get the denominator (i.e. the total volume trade on different exchanges each day) by creating a new dataframe using GroupBy and Aggregate function as shown below:

df_new = df.groupby(['time', 'market']).agg(volume=('volume', 'sum'))

which gives me the aggregated table:

| time       | market  | volume                     |
| ---------- | -----   | -------------------------- |
| 2022-07-01 | Binance | = 300 + 1000 + 300 = 1600  |
|            | Gemini  | = 200 + 2000 + 1000 = 3200 |
|            | Huobi   | = 100 + 2000 = 2100        |
| 2022-07-02 | Binance | = 400 + 1500 + 1500 = 3400 |
|            | Gemini  | = 350 + 500 + 1500 = 2350  |
|            | Huobi   | = 100 + 1500 = 1600        |

I am having trouble to use these data from the new df_new and perform the calculation to the original df to get the expected output as shown on second table...

Does anyone know how I can achieve this? Thanks!

Edit:

print(df.to_dict())

output:

{'time': {0: '2022-07-01', 1: '2022-07-01', 2: '2022-07-01', 3: '2022-07-02', 4: '2022-07-02', 5: '2022-07-02', 6: '2022-07-01', 7: '2022-07-01', 8: '2022-07-02', 9: '2022-07-02', 10: '2022-07-01', 11: '2022-07-01', 12: '2022-07-01', 13: '2022-07-02', 14: '2022-07-02', 15: '2022-07-02'}, 'symbol': {0: 'btcusdt', 1: 'btcusdt', 2: 'btcusdt', 3: 'btcusdt', 4: 'btcusdt', 5: 'btcusdt', 6: 'ethusdt', 7: 'ethusdt', 8: 'ethusdt', 9: 'ethusdt', 10: 'usdcusdt', 11: 'usdcusdt', 12: 'usdcusdt', 13: 'usdcusdt', 14: 'usdcusdt', 15: 'usdcusdt'}, 'market': {0: 'Binance', 1: 'Gemini', 2: 'Huobi', 3: 'Binance', 4: 'Gemini', 5: 'Huobi', 6: 'Binance', 7: 'Gemini', 8: 'Binance', 9: 'Gemini', 10: 'Binance', 11: 'Gemini', 12: 'Huobi', 13: 'Binance', 14: 'Gemini', 15: 'Huobi'}, 'volume': {0: 300, 1: 200, 2: 100, 3: 400, 4: 350, 5: 100, 6: 1000, 7: 2000, 8: 1500, 9: 500, 10: 300, 11: 1000, 12: 2000, 13: 1500, 14: 1500, 15: 1500}}
  • Hi, just a clarification, do you want the sum per day per market as you do in the `df_new` or just per day as you write above? – My Work Jul 16 '22 at 19:41
  • Can you please post the output of `df.to_dict()` ? – Ignatius Reilly Jul 16 '22 at 19:44
  • Hi, I think is per day per market (i.e. I would want to know the percentage of volume traded for BTCUSDT on Binance against the total amount traded on Binance on that day), Please let me know if this is not clear, thank you! – babyadvocado Jul 16 '22 at 19:44
  • Sorry @IgnatiusReilly, with the original data I can't disclose, I am not sure how to print `df.to_dict()` for you... please let me know if there's anything else I can do... – babyadvocado Jul 16 '22 at 19:56
  • Hi @IgnatiusReilly, please see the edit section for the dictionary... Thanks! – babyadvocado Jul 16 '22 at 20:19
  • @babyadvocado, posted a solution. does that answer your question? – Naveed Jul 16 '22 at 21:01

1 Answers1

2

IIUC

is that what you're looking for?

df['mkt_vol'] = df.groupby(['market','time'])['volume'].transform('sum')
df['prod_vol'] = round(df['volume']/df['mkt_vol'] * 100, 2)
df

OR, just a single line

df['prod_vol'] = round(df['volume']/df.groupby(['market','time'])['volume'].transform('sum') *100,2)
df

          time  symbol      market    volume    prod_vol
0   2022-07-01  btcusdt     Binance      300    18.75
1   2022-07-01  btcusdt     Gemini       200     6.25
2   2022-07-01  btcusdt     Huobi        100     4.76
3   2022-07-02  btcusdt     Binance      400    11.76
4   2022-07-02  btcusdt     Gemini       350    14.89
5   2022-07-02  btcusdt     Huobi        100     6.25
6   2022-07-01  ethusdt     Binance     1000    62.50
7   2022-07-01  ethusdt     Gemini      2000    62.50
8   2022-07-02  ethusdt     Binance     1500    44.12
9   2022-07-02  ethusdt     Gemini       500    21.28
10  2022-07-01  usdcusdt    Binance      300    18.75
11  2022-07-01  usdcusdt    Gemini      1000    31.25
12  2022-07-01  usdcusdt    Huobi       2000    95.24
13  2022-07-02  usdcusdt    Binance     1500    44.12
14  2022-07-02  usdcusdt    Gemini      1500    63.83
15  2022-07-02  usdcusdt    Huobi       1500    93.75
Naveed
  • 11,495
  • 2
  • 14
  • 21