2

Suppose I have a table of ages:

ages <- array(round(runif(min=10,max=200,n=100)),dim=100,dimnames=list(age=0:99))

Suppose now I want to collapse my ages table in 5-year wide age groups.

This could be done quite easily by summarizing over different values:

ages.5y <- array(NA,dim=20,dimnames=list(age=paste(seq(from=0,to=95,by=5),seq(from=4,to=99,by=5),sep=""))
ages.5y[1]<-sum(ages[1:5])
ages.5y[2]<-sum(ages[6:10)
...
ages.5y[20]<-sum(ages[96:100])

It could also be done using a loop:

for(i in 1:20) ages.5y[i]<-sum(ages[(5*i-4):(5*i)])

But while this method is easy for "regular" transformations, the loop approach becomes infeasible if the new intervals are irregular, eg. 0-4,5:12,13-24,25-50,60-99.

If, instead of a table, I had individual values, this could be done quite easily using cut:

flattened <- rep(as.numeric(dimnames(ages)$age),ages) table(cut(flattened,breaks=seq(from=0,to=100,by=5)))

This allows the use of any random break points, eg breaks=c(5,10,22,33,41,63,88)

However, this is a quite ressource intense way to do it.

So, my question is: Is there a better way to recode a contingency table?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
mzuba
  • 1,226
  • 1
  • 16
  • 33

1 Answers1

3

You could use cut on the age values, but not the counts. Like this:

ages =0:99
ageCounts = array(round(runif(min=10,max=200,n=100)),dim=100)
groups = cut(ages,breaks=seq(from=-1,to=100,by=5))

Then group them. I use data.table for this:

DT = data.table(ages=ages, ageCounts=ageCounts, groups)
DT[,list(sum=sum(ageCounts)), by=groups]
nsheff
  • 3,063
  • 2
  • 24
  • 29