1

I want to recode or bin multiple numeric columns as categorical values and summing the counts. The columns are ages by individual year from 18yrs to 90+ yrs with the cells containing summed totals by age. I want to create categories to reflect age cohorts and maintain the now cumulative count.
The aim is to categorise the ages in to the following cohorts:

'young_adults_18_29', 'younger_working_age_30_49', 'older_working_age_50_64', 'retirement_age_65_79', 'older_adults_80+'

enter image description here

db2020
  • 69
  • 8

1 Answers1

0
# df <- original dataframe
max_age = df.columns[-1]    # I assumed that the last column value is also the number(age).

df['young_adults_18_29'] = sum(df[i] for i in range(18, 30))
df['younger_working_age_30_49'] = sum(df[i] for i in range(30, 50))
df['older_working_age_50_64'] = sum(df[i] for i in range(50, 65))
df['retirement_age_65_79'] = sum(df[i] for i in range(65, 80))
df['older_adults_80+'] = sum(df[i] for i in range(80, max_age + 1))

df_new = df[['la_code', 'young_adults_18_29', 'younger_working_age_30_49', 'older_working_age_50_64', 'retirement_age_65_79', 'older_adults_80+']]
Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
jaemmin
  • 341
  • 1
  • 12
  • Thank you for your code. I attempted to implement the code, however, I get key errors for the first line on the column headed 18. I may need to reindex the headers. – db2020 Aug 09 '22 at 09:00
  • I have also attempted using bins and perhaps the headers are what was stopping it working. – db2020 Aug 09 '22 at 09:01
  • I guess the type of 18 is string. If so, edit the code as sum(df[str(i)] for i in range(18, 30)) – jaemmin Aug 09 '22 at 23:40