-2

I'm running into a ValueError: Columns must be same length as key when trying to do encoding for the column Type. Here are the codes, not sure which part is wrong.

df.head()

enter image description here

plt.figure(figsize=(7, 5))
sns.heatmap(df.isnull(), cmap='viridis')
df.isnull().any()

enter image description here

df.isnull().sum()

enter image description here

df['Rating'] = df['Rating'].fillna(df['Rating'].median())

replaces = [u'\u00AE', u'\u2013', u'\u00C3', u'\u00E3', u'\u00B3', '[', ']', "'"]
for i in replaces:
    df['Current Ver'] = df['Current Ver'].astype(str).apply(lambda x : x.replace(i, ''))

regex = [r'[-+|/:/;(_)@]', r'\s+', r'[A-Za-z]+']
for j in regex:
    df['Current Ver'] = df['Current Ver'].astype(str).apply(lambda x : re.sub(j, '0', x))

df['Current Ver'] = df['Current Ver'].astype(str).apply(lambda x : x.replace('.', ',',1).replace('.', '').replace(',', '.',1)).astype(float)
df['Current Ver'] = df['Current Ver'].fillna(df['Current Ver'].median())

enter image description here

i = df[df['Category'] == '1.9'].index
df.loc[i]

enter image description here

df = df.drop(i)
df = df[pd.notnull(df['Last Updated'])]
df = df[pd.notnull(df['Content Rating'])]
le = preprocessing.LabelEncoder()
df['App'] = le.fit_transform(df['App'])
category_list = df['Category'].unique().tolist() 
category_list = ['cat_' + word for word in category_list]
df = pd.concat([df, pd.get_dummies(df['Category'], prefix='cat')], axis=1)
le = preprocessing.LabelEncoder()
df['Genres'] = le.fit_transform(df['Genres'])
le = preprocessing.LabelEncoder()
df['Content Rating'] = le.fit_transform(df['Content Rating'])
df['Price'] = df['Price'].apply(lambda x : x.strip('$'))
df['Installs'] = df['Installs'].apply(lambda x : x.strip('+').replace(',', ''))
df['Type'] = pd.get_dummies(df['Type'])

enter image description here

Leo
  • 93
  • 2
  • 11
  • Anyone?........ – Leo Dec 03 '21 at 01:28
  • It would be better if you could provide a small reproducible example with code which can copied and pasted so people can easier help. Have a look at this https://stackoverflow.com/help/minimal-reproducible-example. – mjspier Dec 03 '21 at 15:19

1 Answers1

2

You are trying to map a DataFrame with multiple columns to one column to the original DataFrame.

pd.get_dummies returns a DataFrame with a column for each value in the column.

If you want to add those values to the original DataFrame you can use concat.

Example:

import pandas as pd

df = pd.DataFrame(data=['type1', 'type2', 'type3'], columns=['Type'])
dummies_df = pd.get_dummies(df['Type'])
pd.concat([df, dummies_df], axis=1)
mjspier
  • 6,386
  • 5
  • 33
  • 43
  • No, I will not. That's not how it works. Again, you should provide a small example which you can post in stackoverflow which reproduces the problem you have. I'm pretty sure replacing the line `df['Type'] = pd.get_dummies(df['Type'])` with `df = pd.concat([df, pd.get_dummies(df['Type'])], axis=1))` would solve the issue you have. – mjspier Dec 03 '21 at 18:26