How can I create a new variable that marks the lowest four test scores in each classroom as a 1?

Question

I have a dataset with 500 observations. Two of the variables are "test" which is a continuous variable of a test score and "classroom" which is a numerical variable indicating which classroom (1-20) the observation is nested in. I want to create a new variable "TX" where the four observations with the lowest "test" score from each classroom is a 1 and every other observation is a 0. With a sample of 500 evenly distributed across 20 classrooms, the TX variable should contain 80 1's and 420 0's.

This helps me locate/subset the lowest 4 in each classroom, but I want to mutate my original data set to add a TX column on the end of it where all of these observations would have a 1 and everyone else gets a 0.

data_new1 <- data[order(data$test1, decreasing = FALSE), ]
data_new1 <- Reduce(rbind,                                
                    by(data_new1,
                       data_new1["classroom"],
                       head,
                       n = 4))

What if you have ties? For an extreme example, 5 students have the same lowest score. — Zhiqiang Wang, Jun 01 '23 at 10:06

Ben Bolker · Answer 1 · 2023-05-31T12:38:05.697

2

Use rank(): for the whole data set,

data$newvar <- as.numeric(rank(data$test1)<= 4)

To run it for each class you need some kind of split-apply-combine pipeline, e.g. base R

data |>
   split(data$class) |>
   lapply( \(d) transform(d, newvar = as.numeric(rank(test1) <= 4)) |>
   do.call(what = "rbind")

or tidyverse

data |>
   group_by(class) |>
   mutate(newvar = as.numeric(rank(test1) <= 4))

edited May 31 '23 at 12:38

answered May 31 '23 at 00:08

Ben Bolker

211,554
25
370
453

Thank you for this idea, however it does not consider the lowest four in each classroom only the overall lowest four. – A Hayward May 31 '23 at 02:39

Zhiqiang Wang · Answer 2 · 2023-05-31T02:45:33.940

1

You could also try a tidyverse solution:

data_new1 <- data %>% arrange(classroom, test1) %>% 
    group_by(classroom) %>% 
    mutate(TX = (row_number()<=4) %>% as.integer)

edited May 31 '23 at 02:45

answered May 31 '23 at 00:20

Zhiqiang Wang

6,206
2
13
27

This works but gives values of TRUE and FALSE, do you know how to change them to 0's and 1's? – A Hayward May 31 '23 at 02:39
Yes, revised my answer. Just added `%>% as.integer` part. – Zhiqiang Wang May 31 '23 at 02:46

How can I create a new variable that marks the lowest four test scores in each classroom as a 1?

2 Answers2