0

I want to calculate number of months for each ID number_of_months=df.ID.value_counts() and I got pandas series

        ID
    564 30
    133 30
    156 30
    153 30

with a single column ID

How can I get dataframe df1 with two columns (ID, numberofmonts) to merge with other dataframe df based on ID

df = pd.merge(df, df1, how='left', left_on=['ID'], right_on = ['ID'])

That is standard problem for feature engineering. I want to define some new feature with count or other function and then merge with dataframe

Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
zdz
  • 307
  • 1
  • 2
  • 9

1 Answers1

0

The result of value_counts() is in this case a Series with:

  • index - original ID values,
  • values - how many times this ID occurs in the source Series.

A bit tricky detail is that the name of this Series (ID) refers actually to (the only) column, containg the number of occurrences of particular value it the original ID column.

So your task is to:

  • Rename the index to ID (in place).
  • Rename the Series itself to numberOfMonths or whatever your name of choice (also in place).
  • Reset the index, saving the result in a target variable. As drop parameter is left with its default value (False), the so far existing index becomes an "ordinary" column, keeping its name.

The code to do it is:

number_of_months.index.rename('ID', inplace=True)
number_of_months.rename('numberOfMonths', inplace=True)
df1 = number_of_months.reset_index()

The result, for your sample data, is:

    ID  numberOfMonths
0  564              30
1  133              30
2  156              30
3  153              30

Now ID name pertains to original ID values and you can merge it with another DataFrame just on ID column.

Alternative solution

If your intention is to add numberOfMonths column to df, containing information how many times particular ID occurs in this DataFrame, a quicker and simpler solution is:

df['numberOfMonths'] = df.groupby('ID').transform('count')
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41