0

I have two DataFrames that are each of the exact sane dimensions and I would like to multiply just one specific column from each of them together:

My first DataFrame is:

In [834]: patched_benchmark_df_sim
Out[834]: 
     build_number      name  cycles
0             390     adpcm   21598
1             390       aes    5441
2             390  blowfish     NaN
3             390     dfadd     463
....
284           413      jpeg  766742
285           413      mips    4263
286           413     mpeg2    2021
287           413       sha  348417

[288 rows x 3 columns]

My second DataFrame is:

In [835]: patched_benchmark_df_syn
Out[835]: 
     build_number      name    fmax
0             390     adpcm  143.45
1             390       aes  309.60
2             390  blowfish     NaN
3             390     dfadd  241.02
....
284           413      jpeg  197.75
285           413      mips  202.39
286           413     mpeg2  291.29
287           413       sha  243.19

[288 rows x 3 columns]

And I would like to take each element of the cycles column of patched_benchmark_df_sim and multiply that to the corresponding element of the fmax column of patched_benchmark_df_syn, and then store the result in a new DataFrame that has exactly the same structure, contiaining the build_number and name columns, but now the last column containing all the numerical data will be called latency, which is the product of fmax and cycles.

So the output DataFrame has to look something like this:

    build_number      name    latency
0            390     adpcm    ## each value here has to be product of cycles and fmax and they must correspond to one another ##
......

I tried doing a straightforward patched_benchmark_df_sim * patched_benchmark_df_syn but that did not work as my DataFrames had the name column that's of string type. Is there no builtin pandas method that can do this for me? How could I proceed with the multiplication to get the result I need?

Thank you very much.

AKKO
  • 973
  • 2
  • 10
  • 18

1 Answers1

1

The simplest thing to do is to add a new column to the df and then select the columns you want and if you want assign that to a new df:

In [356]:

df['latency'] = df['cycles'] * df1['fmax']
df
Out[356]:
     build_number      name  cycles       latency
0             390     adpcm   21598  3.098233e+06
1             390       aes    5441  1.684534e+06
2             390  blowfish     NaN           NaN
3             390     dfadd     463  1.115923e+05
284           413      jpeg  766742  1.516232e+08
285           413      mips    4263  8.627886e+05
286           413     mpeg2    2021  5.886971e+05
287           413       sha  348417  8.473153e+07
In [357]:

new_df = df[['build_number', 'name', 'latency']]
new_df
Out[357]:
     build_number      name       latency
0             390     adpcm  3.098233e+06
1             390       aes  1.684534e+06
2             390  blowfish           NaN
3             390     dfadd  1.115923e+05
284           413      jpeg  1.516232e+08
285           413      mips  8.627886e+05
286           413     mpeg2  5.886971e+05
287           413       sha  8.473153e+07

As you've found you can't multiply non-numeric type df's together like you tried. The above is assuming that the build_number and name columns are the same from both dfs.

EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Yes this works great! As you mentioned, it only works if build_number and name columns are same in both dfs. That's why I implemented a lot of pre-processing steps related to some of my earlier posts http://stackoverflow.com/questions/28735609/cryptic-warning-pops-up-when-doing-pandas-assignment-with-loc-and-iloc to actually fill up missing benchmark names and build_numbers with NaN so that they two frames can be nicely multiplied together. – AKKO Feb 27 '15 at 02:08