Group pairwise categories together

Question

I am currently working with a dataset where loans are displayed with a purpose for the loan and an associated loan grade for each loan.

The dataset is called loancase and one of the columns is the purpose while another column is grade.

Below I have the matrix which is to be filled in a pairwise manner with proportions. Each row should total to 100 percent meaning each entry is the proportion for that specific purpose that received that grade. For instance, the row for [Car, ] may look like 20, 20, 0, 0, 20, 0, 40.

Note that the current data placeholder is NA and I am trying to replace that with a vector listing each desired entry.

matrix(data = NA, nrow = 14, ncol = 7, dimnames = list(levels(loancase$purpose), levels(loancase$grade)))

How do I achieve this goal of filling in each entry with the desired value? I am currently thinking I use tapply() but don't know how to achieve that. Here is the current code that will go in the place of "NA" but it is not correct as of now.

grades.per.purpose = tapply(loancase$grade, levels(loancase$purpose), sum)

Please provide data as plain text, not images, so users can copy/paste. — neilfws, Oct 19 '17 at 22:32

score 1 · Answer 1 · answered Oct 19 '17 at 22:44

1

Since you didn't supply usable data, I'll make up a toy example:

df = read.table(text = "grade   purpose   amount
            A  Car   100
            B  Car   200
            C  Car   100
            A  Moving  200
            B  Moving  50
            B  Moving  50", header = TRUE)

We want to show Car loans are 50% B-Grade, 25% A- and C-grade. And Moving loans are 67% A-grade, 33% B-grade.

I like to use dplyr library for this kind of grouping and summarising:

library(dplyr)
x = df %>% 
    group_by(purpose) %>% 
    mutate(purpose.total = sum(amount)) %>% 
    group_by(purpose, grade) %>% 
    summarise(percent = sum(amount / purpose.total))

The result:

  purpose  grade   percent
1     Car      A 0.2500000
2     Car      B 0.5000000
3     Car      C 0.2500000
4  Moving      A 0.6666667
5  Moving      B 0.3333333

To group it into a square like you asked for, try the tidyr library:

tidyr::spread(x, key = grade, value = percent, fill = 0)

Result:

  purpose         A         B     C
1     Car 0.2500000 0.5000000  0.25
2  Moving 0.6666667 0.3333333  0.00

answered Oct 19 '17 at 22:44

lebelinoz

4,890
10
33
56

This is the exact type of result I'm trying to achieve thank you. Unfortunately for these purposes I cannot use any external libraries. Is there a way to do this using native R functions? – user1713336 Oct 19 '17 at 22:46
@user1713336 R is much easier to use if you're allowed external libraries. Are you sure you can't get packages? – lebelinoz Oct 19 '17 at 22:48
I would love to use libraries as well but I am restricted from using them for this project. – user1713336 Oct 19 '17 at 22:49
@user1713336 All I can think of is `aggregate` for summarising the data... I'm not sure of a clean way to spread the data out. Please edit your question so that (1) it includes actual, usable data which somebody could copy-paste, and (2) clarify that you're not allowed to use packages. – lebelinoz Oct 19 '17 at 22:55
To have the percentages (proportions, actually), you can do with `base R` only the following: `with(df, ave(amount, purpose, FUN = function(x) x/sum(x)))`. – Rui Barradas Oct 19 '17 at 23:30
@RuiBarradas That doesn't quite summarise by grade. It says `Moving` loans are 67% A, 17% B and 17% B again (instead of 67% A, 33% B) – lebelinoz Oct 19 '17 at 23:41
@lebelinoz OK, I will post an answer. – Rui Barradas Oct 19 '17 at 23:53
1

@lebelinoz - `reshape` with `direction="wide"` is the equivalent of `spread` – thelatemail Oct 19 '17 at 23:59

score 1 · Answer 2 · answered Oct 19 '17 at 23:56

1

Though I believe that it's nonsense to be forbidden to use packages, there is a base R solution, with the final result presented in a way that might please the OP.

xt <- xtabs(amount ~ grade + purpose, df)
t(xt)/colSums(xt)

answered Oct 19 '17 at 23:56

Rui Barradas

70,273
8
34
66

Group pairwise categories together

2 Answers2