-3

I am currently working with a dataset where loans are displayed with a purpose for the loan and an associated loan grade for each loan.

The dataset is called loancase and one of the columns is the purpose while another column is grade. enter image description here

Below I have the matrix which is to be filled in a pairwise manner with proportions. Each row should total to 100 percent meaning each entry is the proportion for that specific purpose that received that grade. For instance, the row for [Car, ] may look like 20, 20, 0, 0, 20, 0, 40.

Note that the current data placeholder is NA and I am trying to replace that with a vector listing each desired entry.

matrix(data = NA, nrow = 14, ncol = 7, dimnames = list(levels(loancase$purpose), levels(loancase$grade)))

enter image description here

How do I achieve this goal of filling in each entry with the desired value? I am currently thinking I use tapply() but don't know how to achieve that. Here is the current code that will go in the place of "NA" but it is not correct as of now.

grades.per.purpose = tapply(loancase$grade, levels(loancase$purpose), sum)
user1713336
  • 131
  • 5
  • 14

2 Answers2

1

Since you didn't supply usable data, I'll make up a toy example:

df = read.table(text = "grade   purpose   amount
            A  Car   100
            B  Car   200
            C  Car   100
            A  Moving  200
            B  Moving  50
            B  Moving  50", header = TRUE)

We want to show Car loans are 50% B-Grade, 25% A- and C-grade. And Moving loans are 67% A-grade, 33% B-grade.

I like to use dplyr library for this kind of grouping and summarising:

library(dplyr)
x = df %>% 
    group_by(purpose) %>% 
    mutate(purpose.total = sum(amount)) %>% 
    group_by(purpose, grade) %>% 
    summarise(percent = sum(amount / purpose.total))

The result:

  purpose  grade   percent
1     Car      A 0.2500000
2     Car      B 0.5000000
3     Car      C 0.2500000
4  Moving      A 0.6666667
5  Moving      B 0.3333333   

To group it into a square like you asked for, try the tidyr library:

tidyr::spread(x, key = grade, value = percent, fill = 0)

Result:

  purpose         A         B     C
1     Car 0.2500000 0.5000000  0.25
2  Moving 0.6666667 0.3333333  0.00    
lebelinoz
  • 4,890
  • 10
  • 33
  • 56
  • This is the exact type of result I'm trying to achieve thank you. Unfortunately for these purposes I cannot use any external libraries. Is there a way to do this using native R functions? – user1713336 Oct 19 '17 at 22:46
  • @user1713336 R is much easier to use if you're allowed external libraries. Are you sure you can't get packages? – lebelinoz Oct 19 '17 at 22:48
  • I would love to use libraries as well but I am restricted from using them for this project. – user1713336 Oct 19 '17 at 22:49
  • @user1713336 All I can think of is `aggregate` for summarising the data... I'm not sure of a clean way to spread the data out. Please edit your question so that (1) it includes actual, usable data which somebody could copy-paste, and (2) clarify that you're not allowed to use packages. – lebelinoz Oct 19 '17 at 22:55
  • To have the percentages (proportions, actually), you can do with `base R` only the following: `with(df, ave(amount, purpose, FUN = function(x) x/sum(x)))`. – Rui Barradas Oct 19 '17 at 23:30
  • @RuiBarradas That doesn't quite summarise by grade. It says `Moving` loans are 67% A, 17% B and 17% B again (instead of 67% A, 33% B) – lebelinoz Oct 19 '17 at 23:41
  • @lebelinoz OK, I will post an answer. – Rui Barradas Oct 19 '17 at 23:53
  • 1
    @lebelinoz - `reshape` with `direction="wide"` is the equivalent of `spread` – thelatemail Oct 19 '17 at 23:59
1

Though I believe that it's nonsense to be forbidden to use packages, there is a base R solution, with the final result presented in a way that might please the OP.

xt <- xtabs(amount ~ grade + purpose, df)
t(xt)/colSums(xt)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66