0

I have a dataframe with 40 million rows,and I want to change some colums by

age = data[data['device_name'] == 12]['age'].apply(lambda x : x if x != -1 else max_age)
data.loc[data['device_name'] == 12,'age'] = age 

but this method is too slow, how can I speed it up. Thanks for all reply!

Thomas Kimber
  • 10,601
  • 3
  • 25
  • 42

1 Answers1

0

you might wanna change the first part to :

age = data[data['device_name'] == 12]['age']
age[age == -1] = max_age
data.loc[data['device_name'] == 12,'age'] = age 

you could use, to me more concise(this could gain you a little speed)

cond = data['device_name'] == 12
age = data.loc[cond, age]
data.loc[cond,'age'] = age.where(age != -1, max_age) 
Ayoub ZAROU
  • 2,387
  • 6
  • 20