0

I have a problem. I have a list with all amenities of a house. I want to split that list and print it out under each other with pandas melt. I want to use the function melt. I tried that but my result don't looks like what I want.

d = {'hostid': [1, 1, 2], 
     'inside': ['["Coffe", "Shampoo", "Parking"]', 
                                     '["Coffe", "Hair dryer"]', 
                                     '["Parking", "Shampoo", "TV"]'], 
     'location_id': [100, 101, 200]}
df = pd.DataFrame(data=d)
df['inside'] = df['inside'].copy().str.replace('"', '', regex=False).str.replace(
                            '[', '', regex=False).str.replace(']', '', regex=False)
print(df)

[OUT]
   hostid                   inside  location_id
0       1  Coffe, Shampoo, Parking          100
1       1        Coffe, Hair dryer          101
2       2     Parking, Shampoo, TV          200

df_paths = df.copy()
df_paths = df_paths['inside'].str.split(',', expand=True)
print(df_paths.head())

[OUT]
         0            1         2
0    Coffe      Shampoo   Parking
1    Coffe   Hair dryer      None
2  Parking      Shampoo        TV

df_paths = df_paths.melt(ignore_index=False).dropna()
df_paths.reset_index(inplace=True)
df_paths.head()

df_paths = df_paths.melt(ignore_index=False).dropna()
df_paths.reset_index(inplace=True)
df_paths.rename(columns={"index": "hostid"}, errors="raise")
df_paths.head()

[OUT]
   hostid variable       value
0      0         0        Coffe
1      1         0        Coffe
2      2         0      Parking
3      0         1      Shampoo
4      1         1   Hair dryer

What I want

   hostid inside  location_id 
1       1 Coffe       100
2       1 Shampoo     100
3       1 Parking     100          
4       1 Coffe       101
5       1 Hair dryer  101
6       2 Parking     200
7       2 Shampoo     200
8       2 TV          200  

# OR


    inside  location_id 
1   Coffe       100
2   Shampoo     100
3   Parking     100          
4   Coffe       101
5   Hair dryer  101
6   Parking     200
7   Shampoo     200
8   TV          200  
Mr. Hankey
  • 954
  • 1
  • 5
  • 12

1 Answers1

1

Simply split the string without expanding and explode:

(df.assign(inside=df['inside'].str.split(', '))
   .explode('inside')
   .reset_index(drop=True)
)

input (inside are strings):

   hostid                   inside  location_id
0       1  Coffe, Shampoo, Parking          100
1       1        Coffe, Hair dryer          101
2       2     Parking, Shampoo, TV          200

output:

   hostid      inside  location_id
0       1       Coffe          100
1       1     Shampoo          100
2       1     Parking          100
3       1       Coffe          101
4       1  Hair dryer          101
5       2     Parking          200
6       2     Shampoo          200
7       2          TV          200
version using melt (not recommended)
(df.assign(variable=range(len(df))).drop('inside', axis=1)
   .merge(df_paths.melt(value_name='inside').dropna(),
          on='variable'
         )
   .drop('variable', axis=1)
)
mozway
  • 194,879
  • 13
  • 39
  • 75