0

is there a way to obtain a weighted dummy variable using pandas? I have a two dataframes, one with the categorical values and another with a continuous variable...

df1 = pd.DataFrame(data=[[1., 3., 2.], [2., 1.], [0.], [0., 2., 2.], [0., 2.]])
df2 = pd.DataFrame(data=[['a', 'c', 'd'], ['a', 'b'], ['c'], ['b', 'c', 'd'], ['a', 'b']])

The idea is to obtain a dummy dataframe, but with weighted dummy variables... meaning: for row 0, 1.0 + 3.0 + 2.0 = 100%... the dummy variables should be, instead of 0 and 1:

a = 1.0/6.0
c = 3.0/6.0
d = 2.0/6.0

and each of this results, should be the dummy dataframe.

What I actually have is that it is 0 or 1, 0 if it is NaN and 1 if it exists...

dummies = pd.get_dummies(df2, columns=[0,1,2])

And this is my output

What I intend to do is to obtain the same matrix... but, instead of 1s and 0s obtain the weighted dummy variable... a, b and c have different importance on my model...

Elias Urra
  • 83
  • 1
  • 11
  • 2
    I don't understand your question. Could you show desired output dataframe? – DSteman Jun 24 '21 at 12:12
  • i edited the question... is it clearer? – Elias Urra Jun 24 '21 at 14:15
  • I still don't see your desired output dataframe with weights. Don't get me wrong, but that would really help to understand your problem. – DSteman Jun 24 '21 at 14:38
  • I need the exact same output i showed in the picture/table attached, but, instead of being 1 or 0, I need them to be a number between 0 and 1, considering decimals, in order to weight the categorical value... – Elias Urra Jun 24 '21 at 14:57

0 Answers0