2

I would like to assign the max value of a group to all rows within that group. How do I do that?

I have a dataframe containing the names of the group and the max number of credits that belongs to it.

course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max)

which gives

    Course    Credits
1   ABC1000  6.5
2   ABC1003  6.5
3   ABC1004  6.5
4   ABC1007  5.0
5   ABC1010  6.5
6   ABC1021  6.5
7   ABC1023  6.5

The main dataframe looks like this:

Appraisal.Type   Resits   Credits Course_code   Student_ID          
Final result       0       6.5    ABC1000           10                
Final result       0       6.5    ABC1003           10               
Grade supervisor   0       0      ABC1000           10               
Grade supervisor   0       0      ABC1003           10 
Final result       0       12     ABC1294           23   
Grade supervisor   0       0      ABC1294           23     

As you see, student 10 took course ABC1000, worth 6.5 credits. For each course (per student), however, two rows exist: Final result and Grade supervisor. In the end, Final result should be deleted, but the credits should be kept. Therefore, I want to assign the max value of 6.5 to the Grade supervisor row. Likewise, student 23 has followed course ABC1294, worth 12 credits.

In the end, this should be the result:

Appraisal.Type   Resits   Credits Course_code   Student_ID                      
Grade supervisor   0       6.5      ABC1000           10               
Grade supervisor   0       6.5      ABC1003           10    
Grade supervisor   0       12       ABC1294           23               

How do I go about this?

jtsbattle
  • 185
  • 1
  • 8

3 Answers3

2

An option would be to group by 'Student_ID', mutate the 'Credits' with max of 'Credits' and filter the rows with 'Appraisal.Type' as "Grade supervisor"

library(dplyr)
df1 %>%
   group_by(Student_ID) %>%
   dplyr::mutate(Credits = max(Credits)) %>%
   ungroup %>%
   filter(Appraisal.Type == "Grade supervisor")
# A tibble: 2 x 5
#  Appraisal.Type   Resits Credits Course_code Student_ID
#  <chr>             <int>   <dbl> <chr>            <int>
#1 Grade supervisor      0     6.5 ABC1000             10
#2 Grade supervisor      0     6.5 ABC1003             10

If we also need 'Course_code' to be included in the grouping

df2 %>%
  group_by(Student_ID, Course_code) %>% 
  dplyr::mutate(Credits = max(Credits)) %>%  
  filter(Appraisal.Type == "Grade supervisor")
# A tibble: 3 x 5
# Groups:   Student_ID, Course_code [3]
#  Appraisal.Type   Resits Credits Course_code Student_ID
#  <chr>             <int>   <dbl> <chr>            <int>
#1 Grade supervisor      0     6.5 ABC1000             10
#2 Grade supervisor      0     6.5 ABC1003             10
#3 Grade supervisor      0    12   ABC1294             23

NOTE: I case, plyr package is also loaded, there can be some masking of functions esp summarise/mutate which is also found in plyr. To prevent it, either do this on a fresh session without loading plyr or explicitly specify dplyr::mutate

data

df1 <- structure(list(Appraisal.Type = c("Final result", "Final result", 
"Grade supervisor", "Grade supervisor"), Resits = c(0L, 0L, 0L, 
0L), Credits = c(6.5, 6.5, 0, 0), Course_code = c("ABC1000", 
"ABC1003", "ABC1000", "ABC1003"), Student_ID = c(10L, 10L, 10L, 
10L)), class = "data.frame", row.names = c(NA, -4L)) 



df2 <- structure(list(Appraisal.Type = c("Final result", "Final result", 
"Grade supervisor", "Grade supervisor", "Final result", "Grade supervisor"
), Resits = c(0L, 0L, 0L, 0L, 0L, 0L), Credits = c(6.5, 6.5, 
0, 0, 12, 0), Course_code = c("ABC1000", "ABC1003", "ABC1000", 
"ABC1003", "ABC1294", "ABC1294"), Student_ID = c(10L, 10L, 10L, 
10L, 23L, 23L)), class = "data.frame", row.names = c(NA, -6L))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could you explain the need for `ungroup`? Is it to remove errors in future evaluation? As in its a good habit etc... – Hector Haffenden May 26 '19 at 16:03
  • 1
    @HectorHaffenden 1) It is a good habit, 2) here, I used it due to avoid the `filter`ing with the grouping info as well, 3) some operations can have attribute issues when group_by is present – akrun May 26 '19 at 16:14
  • @akrun it does the job if I subset a certain student (e.g. student with ID 10). Is does, however, not work when I apply it to my complete dataframe. This dataframe includes many students and their corresponding academic work. These students take courses that have different amounts of credits. Some courses are 12 credits, some are 5, some are 6.5, some are 4 etc. If I apply your solution, all courses get 12 credits (because this is the max of all the credits). How would I go about this when I want the max of credits per course? – jtsbattle May 26 '19 at 18:11
  • @jtsbattle Here, I am doing a group by 'Student_ID' , max, if you also need to incldue based on the 'Course_code, then have to inlcude that column aslso in the `group_by`. But, based on this example, the 'Course_code' were different and you seem to want the max across all course_id – akrun May 26 '19 at 18:13
  • @akrun maybe my explanation wasn't that clear. I edited the original post to make the idea more clear. Adding Course_code to the `group_by()` argument doesn't work unfortunately. I still get the whole Credits column filled with 12 credits. – jtsbattle May 26 '19 at 18:22
  • @akrun unfortunately, your solution (I also tested it with your df2) still results in 12 credits across the whole column.. – jtsbattle May 26 '19 at 18:28
  • @jtsbattle That is the output you showed for expected output – akrun May 26 '19 at 18:28
  • 1
    @jtsbattle May be you need to provide a better example. As far as by looking at your expected and mine, it seems similar to me – akrun May 26 '19 at 18:29
  • 1
    @jtsbattle Can you try with `%>% dplyr::mutate(Credits = max(Credits))` in case you have loaded `plyr` and the `plyr::mutate` is masking the dplyr – akrun May 26 '19 at 18:32
  • @akrun I think that did the job! Thanks a lot! – jtsbattle May 26 '19 at 18:37
1

Here's a data.table solution,

DT[,Credits := max(Credits),by=Student_ID]
Result <- DT[Appraisal.Type == "Grade supervisor"]
abcxyz
  • 81
  • 3
0

Generate a sample dataset.

data <- as.data.frame(list(Appraisal.Type = c(rep("Final result", 2), rep("Grade supervisor", 2)), 
                      Resits = rep(0, 4), 
                      Credits = c(rep(6.5, 2), rep(0, 2)), 
                      Course_code = rep(c("ABC1000", "ABC1003"), 2), 
                      Student_ID = rep(10, 4)))

Assign the max value of a group to all rows in this group and then delete rows that contain "Final results".

##Reassign the values of "Credits" column
for (i in 1: nlevels(as.factor(data$Course_code))) {
  Course_code <- unique(data$Course_code)[i]
  data$Credits [data$Course_code == Course_code] <- max (data$Credits [data$Course_code == Course_code]) 
}
##New dataset without "Final result" rows
data <- data[data$Appraisal.Type != "Final result",]

Here is the result.

data
    Appraisal.Type Resits Credits Course_code Student_ID
3 Grade supervisor      0     6.5     ABC1000         10
4 Grade supervisor      0     6.5     ABC1003         10
HCQ
  • 96
  • 5