1

There is a CSV which reads as follows:

bike_sharing = pd.read_csv("BIKE_SHARING_ASSIGNMENT\day.csv")
bike_sharing.yr

The yr has 2 possible values: 0 and 1. I want to update the collection and map them to 2018 and 2019 respectively. I currently doing it as follows:

bike_sharing ['yr'] = bike_sharing[['yr']].apply(lambda x: x.map({0:'2018',1:'2019'}) )
bike_sharing ['yr'].value_counts() 

I get correct results the first time, but when I run it the second time, it changes all values to NAN. Why does this happen?

lime
  • 801
  • 8
  • 21

2 Answers2

1

The first time the map runs on yr, it faces input values of 0 and 1, and your translation dictionary {0:'2018', 1:'2019'} handles those.

The second time, it faces input values of 2018 and 2019, and there're no entries in the dictionary for those items. Thus they get dropped and turned to NANs.

This is documented - see Series.map docs.

Instead, you should use a method that doesn't drop items not in the map. That one is Series.replace - see also this question:

bike_sharing['yr] = bike_sharing[['yr']].apply(lambda x: x.replace({0:'2018',1:'2019'}))

You could also do an in-place replacement that is less verbose:

bike_sharing[['yr']].replace({0:'2018', 1:'2019'}, inplace=True)
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
0

In this case, you could simply add 2018 to the year value and no map is required.

Carlos Melus
  • 1,472
  • 2
  • 7
  • 12