0

I have a table in R that looks like (below is just a sample):

|       | 15 | 17 | 18 | 22 | 25 | 26 | 27 | 29 | 
|-------|----|----|----|----|----|----|----|----|
| 10000 | 1  | 2  | 1  | 2  | 4  | 3  | 5  | 2  |
| 20000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 30000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |
| 40000 | 0  | 0  | 0  | 1  | 2  | 3  | 6  | 3  |
| 50000 | 0  | 0  | 0  | 0  | 0  | 0  | 1  | 1  |
| 60000 | 0  | 0  | 0  | 0  | 0  | 0  | 0  | 0  |

The rows are income levels, and the columns are age levels. I am essentially creating this table to see if age is related to income via a Chi-squared test. The numbers in the table are numbers of occurrences e.g. There are 2 people aged 17 in my dataset with income of 10000.

Both age and income level of type "num" in R so are continuous.

I want to essentially combine the columns for age so that I get a table with everyone who has income of 10k and is between age 15-25, age 25-35, etc. so I end up with much fewer columns.

Note also that colnames(tbl) = "15","17", "18", not "Age" - I haven't defined an overarching name for my columns and rows.

I note this answer does something similar but not sure how to apply it given I don't have a name for my columns e.g. "mpg" (in the case of the link).

Any ideas?

Community
  • 1
  • 1
Dhruv Ghulati
  • 2,976
  • 3
  • 35
  • 51
  • I think you may convert the wide format to long format (possibly by `as.data.frame` if it is a table object or `melt` if it is matrix) and use `cut` to create a grouping column based on the column in the long format, get the count based on that, and reshape it back to wide format with `dcast` from `reshape2`. BTW, do you have `table` object or `matrix` – akrun May 28 '15 at 10:44
  • I have a `table` object as output of `str(tbl)` is: "'table' int [1:7, 1:42]" etc.. – Dhruv Ghulati May 29 '15 at 12:19
  • Anyway, you have a solution posted below, which I think should work. If not convert it to `mat <- as.matrix(tbl)` and then try it. – akrun May 29 '15 at 12:21

1 Answers1

1

Made my own matrix here, but should work for df's aswell.

mat <- matrix(sample(1:10,8500,replace = TRUE),ncol=85)
colnames(mat) <- 15:99
levs <- cut(as.numeric(colnames(mat)),seq(15,105,10),right = FALSE)
res <- sapply(as.character(unique(levs)),function(x)rowSums(mat[,levs==x]))

Edit: If you want the same colnames as in mat, but counts according to the category, in addition do:

res <- res[,levs] # expands the res df to one category count col pr. original col in mat.
colnames(res) <- colnames(mat) # renames cols to reflect input matrix mat.
Nightwriter
  • 514
  • 5
  • 11
  • Thanks @Nightwriter. However, doesn't this assume that my column names are ordered from 15:(85-1)? In my case, the columns step randomly according to whatever was in the original table e.g. 15, 17, 18, 19, 22, 26 etc. - I need to retain the exact same column names (corresponding to ages) as in the original table. – Dhruv Ghulati May 29 '15 at 12:29
  • Second line just associates numbers with cols. It could have been any integers in random order suited your problem. You need to make sure that third line reflect your desired intervals. Look at `?cut()`. Here intervals start at 15, ends at 105 and are 10 long. Argument `right=FALSE` ensures that 25 belongs to interval `[25,35)` rather than `[15,25)`. Fourth line sums up cols for each interval pr. line, regardles of their position in the matrix. – Nightwriter May 30 '15 at 21:33
  • Made edit to output retained col names but with category sum values instead of input values. – Nightwriter May 30 '15 at 21:45