1

I have an R df where one column, assignment, looks like this:

course instance assignment
1 1 A
1 1 B
1 2 B
1 2 C
2 1 A
2 1 C
2 2 B
2 2 A

I need to create a superset (for lack of a better term) of all of the assignments in a course across instances.

For example: Course 1 was offered 2x, and in instance 1 it included assignments A and B, and in instance 2 it included assignments B and C. The superset of assignments in this class should include assignments A, B, and C each one time. In other words, every assignment that appears at least once across instances of a course should appear exactly one time in the superset.

UPDATE: I've tried the suggestion below.

library(tidyverse); df %>% group_by(course) %>% 
summarise(all_assignments = toString(sort(unique(assignment))), 
.groups = "drop")

This returns the following:

all_assignments .groups
A drop

I've now tested this on the following sample data set:

df <- read.table(text = "course instance    assignment
1   1   A
1   1   B
1   2   B
1   2   C
2   1   A
2   1   C
2   2   B
2   2   A", header = T)

Which returns a similar structure:

all_assignments .groups
A, B, C drop

Apparently this exact code has worked for others, so I'm wondering what I'm doing incorrectly?

bvecc
  • 187
  • 4
  • 14
  • 1
    Can you please include your expected output. That will help us understood what you're after. For example, `library(tidyverse); df %>% group_by(course) %>% summarise(all_assignments = toString(sort(unique(assignment))), .groups = "drop")` returns a comma-separated string of assignments across all `instance`s for every `course`. Is that what you're after? – Maurits Evers Oct 14 '22 at 04:45
  • Yes, that's the basic idea. I tried your suggestion and updated the question with the output, which is a single row. I'm sure it's a small error I'm missing. – bvecc Oct 14 '22 at 05:08
  • Please see below for a fully reproducible example. Make sure that your actual data matches your sample data, e.g. column names are exactly the same (R is case-sensitive). – Maurits Evers Oct 14 '22 at 05:29

1 Answers1

1

I'm not entirely clear on your expected output (see my comment above); please have a look at the following

library(dplyr)
df %>% 
    group_by(course) %>% 
    summarise(
        all_assignments = toString(sort(unique(assignment))), 
        .groups = "drop")
## A tibble: 2 × 2
#  course all_assignments
#   <int> <chr>          
#1      1 A, B, C        
#2      2 A, B, C       

This is tested & verified on R_4.2.0 with dplyr_1.0.9.


Sample data

df <- read.table(text = "course instance    assignment
1   1   A
1   1   B
1   2   B
1   2   C
2   1   A
2   1   C
2   2   B
2   2   A", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Strange, I'm getting a similar output for the sample data, where the assignments are grouped, but not linked to a course (i.e. the column header is "all_assignments" and the first cell says "A, B, C", and the second column is ".groups" with a row cell that says "drop"). I'll play around with it a bit more. What does your output look like for the sample? – bvecc Oct 14 '22 at 14:17
  • @briahnah Sorry but I don't know what you mean by *"where the assignments are grouped, but not linked to a course"*. The above works reproducibly for the sample data included (as per your main post). There should be no `.groups` column. First step is to verify on your end by copy & pasting data & code from my answer. If your actual data are different, please provide representative sample data otherwise it's difficult to help & debug. – Maurits Evers Oct 14 '22 at 22:22
  • Omg, it ended up being a package conflict. I notice that your original suggestion imports tidyverse, and your second suggestion imports dplyr, which was the confusion. I started a new image workspace and it seems to work fine now. Thanks for the support! – bvecc Oct 17 '22 at 20:39