1

I have the following dataframe and when I apply melt function:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
df_rm_features_melted = df_rm_features.melt(
    id_vars=['id', 'date'],
    value_vars=df_rm_features.select_dtypes(include=numerics).columns
)

I get the error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects

id      date  factoring  lglc overdue_60  distress  max_dpd  tr_op
0288     12/1/2018  0       1       1           1        0       0
0288     1/1/2019   0       0       0           0        10      1
0288     2/1/2019   0       0       0           0        2       1
0288     3/1/2019   0       0       0           0        52      1
0288     4/1/2019   0       0       0           1        2       0
aneroid
  • 12,983
  • 3
  • 36
  • 66
gunel
  • 161
  • 13
  • What is the source dataframe? What is the expected output dataframe? – sammywemmy Apr 10 '21 at 09:12
  • @sammywemmy source dataframe sample is the one I have shared at the end of the question – gunel Apr 10 '21 at 09:32
  • kindly add the expected output – sammywemmy Apr 10 '21 at 09:43
  • Have you made sure the `dtypes` of `df_rm_features` correspond to the ones you listed in `numerics`? If not, then your `select_dtypes` will return an empty list. Moreover, I see you select columns from `df_rm_features_unprocessed` but you melt `df_rm_features`. – Raphaele Adjerad Apr 10 '21 at 09:47

1 Answers1

1

Firstly, the error I get from your code is ValueError: arrays must all be same length. Probably because id is in the id_vars list and is also a numeric column, so ends up in the value_vars list as well.

To remove the id and any other columns from value_vars without having to explicitly specify either, use numpy.setdiff1d() with the select_dtypes clause:

id_vars=['id', 'date']
wanted_vals = df_rm_features.select_dtypes(include=numerics).columns
canhave_vals = np.setdiff1d(wanted_vals, id_vars)
df_rm_features.melt(id_vars=id_vars,
                    value_vars=canhave_vals)

Output:

     id       date    variable  value
0   288  12/1/2018    distress      1
1   288   1/1/2019    distress      0
2   288   2/1/2019    distress      0
3   288   3/1/2019    distress      0
4   288   4/1/2019    distress      1
5   288  12/1/2018   factoring      0
6   288   1/1/2019   factoring      0
7   288   2/1/2019   factoring      0
8   288   3/1/2019   factoring      0
9   288   4/1/2019   factoring      0
10  288  12/1/2018        lglc      1
11  288   1/1/2019        lglc      0
12  288   2/1/2019        lglc      0
13  288   3/1/2019        lglc      0
14  288   4/1/2019        lglc      0
15  288  12/1/2018     max_dpd      0
16  288   1/1/2019     max_dpd     10
17  288   2/1/2019     max_dpd      2
18  288   3/1/2019     max_dpd     52
19  288   4/1/2019     max_dpd      2
20  288  12/1/2018  overdue_60      1
21  288   1/1/2019  overdue_60      0
22  288   2/1/2019  overdue_60      0
23  288   3/1/2019  overdue_60      0
24  288   4/1/2019  overdue_60      0
25  288  12/1/2018       tr_op      0
26  288   1/1/2019       tr_op      1
27  288   2/1/2019       tr_op      1
28  288   3/1/2019       tr_op      1
29  288   4/1/2019       tr_op      0
aneroid
  • 12,983
  • 3
  • 36
  • 66