6

I am doing a value_counts() over a column of integers that represent categorical values.

I have a dict that maps the numbers to strings that correspond to the category name.

I want to find the best way to have the index with the corresponding name. As I am not happy with my 4 lines solution.

My current solution

df = pd.DataFrame({"weather": [1,2,1,3]})
df
>>>
   weather
0        1
1        2
2        1
3        3

weather_correspondance_dict = {1:"sunny", 2:"rainy", 3:"cloudy"}

Now how I solve the problem:

df_vc = df.weather.value_counts()
index = df_vc.index.map(lambda x: weather_correspondance_dict[x] )
df_vc.index = index
df_vc
>>>
sunny     2
cloudy    1
rainy     1
dtype: int64

Question

I am not happy with that solution that is very tedious, do you have a best practice for that situation ?

jpp
  • 159,742
  • 34
  • 281
  • 339
Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33

3 Answers3

8

This is my solution :

>>> weather_correspondance_dict = {1:"sunny", 2:"rainy", 3:"cloudy"}
>>> df["weather"].value_counts().rename(index=weather_correspondance_dict)
    sunny     2
    cloudy    1
    rainy     1
    Name: weather, dtype: int64
dimension
  • 982
  • 10
  • 18
1

Here's a simpler solution:

weathers = ['sunny', 'rainy', 'cloudy']
weathers_dict = dict(enumerate(weathers, 1))

df_vc = df['weather'].value_counts()
df_vc.index = df_vc.index.map(weathers_dict.get)

Explanation

  • Use dict with enumerate to construct a dictionary mapping integers to a list of weather types.
  • Use dict.get with pd.Index.map. Unlike pd.Series.apply, you cannot pass a dictionary directly, but you can pass a callable function instead.
  • Update the index directly rather than using an intermediary variable.

Alternatively, you can apply your map to weather before using pd.Series.value_counts. This way, you do not need to update the index of your result.

df['weather'] = df['weather'].map(weathers_dict)
df_vc = df['weather'].value_counts()
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thx @jpp, I like the `df['weather'].map(weathers_dict)` solution which could then be transformed into [categorical](https://pandas.pydata.org/pandas-docs/stable/categorical.html#categorical-data) type. If someone combines (or edit) both @L-- and @jpp answers that would be great ! – Adrien Pacifico Jul 26 '18 at 11:40
  • @AdrienPacifico, I added a separate solution if you wish to use categorical data. – jpp Jul 26 '18 at 13:00
0

Categorical data

You can use Categorical Data with pd.CategoricalIndex.rename_categories:

s = df['weather'].value_counts()
s.index = pd.Categorical(s.index).rename_categories(weather_correspondance_dict)

This feature is available in Pandas v0.21+.

jpp
  • 159,742
  • 34
  • 281
  • 339