1

I have a dataset with 500 observations. Two of the variables are "test" which is a continuous variable of a test score and "classroom" which is a numerical variable indicating which classroom (1-20) the observation is nested in. I want to create a new variable "TX" where the four observations with the lowest "test" score from each classroom is a 1 and every other observation is a 0. With a sample of 500 evenly distributed across 20 classrooms, the TX variable should contain 80 1's and 420 0's.

This helps me locate/subset the lowest 4 in each classroom, but I want to mutate my original data set to add a TX column on the end of it where all of these observations would have a 1 and everyone else gets a 0.

data_new1 <- data[order(data$test1, decreasing = FALSE), ]
data_new1 <- Reduce(rbind,                                
                    by(data_new1,
                       data_new1["classroom"],
                       head,
                       n = 4))
Ken White
  • 123,280
  • 14
  • 225
  • 444
A Hayward
  • 11
  • 1

2 Answers2

2

Use rank(): for the whole data set,

data$newvar <- as.numeric(rank(data$test1)<= 4)

To run it for each class you need some kind of split-apply-combine pipeline, e.g. base R

data |>
   split(data$class) |>
   lapply( \(d) transform(d, newvar = as.numeric(rank(test1) <= 4)) |>
   do.call(what = "rbind")

or tidyverse

data |>
   group_by(class) |>
   mutate(newvar = as.numeric(rank(test1) <= 4))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you for this idea, however it does not consider the lowest four in each classroom only the overall lowest four. – A Hayward May 31 '23 at 02:39
1

You could also try a tidyverse solution:

data_new1 <- data %>% arrange(classroom, test1) %>% 
    group_by(classroom) %>% 
    mutate(TX = (row_number()<=4) %>% as.integer)
Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27