0

I have the following data showing 5 possible kids to invite to a party and what neighborhoods they live in.

I have a list of solutions as well (binary indicators of whether the kid is invited or not; e.g., the first solution invites Kelly, Gina, and Patty.

data <- data.frame(c("Kelly", "Andrew", "Josh", "Gina", "Patty"), c(1, 1, 0, 1, 0), c(0, 1, 1, 1, 0))
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1), c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))

I'm looking for a way to now filter the solutions in the following ways:

a) Only keep solutions where there are at least 3 kids from both neighborhood A and neighborhood B (one kid can count as one for both if they're part of both)

b) Only keep solutions that have at least 3 kids selected (i.e., sum >= 3)

I think I need to somehow join data to the solutions in solutions, but I'm a bit lost on how to manipulate everything since the solutions are stuck in lists. Basically looking for a way to add entries to every solution in the list indicating a) how many kids the solution has, b) how many kids from neighborhood A, and c) how many kids from neighborhood B. From there I'd have to somehow filter the lists to only keep the solutions that satisfy >= 3?

Thank you in advance!

Kathy
  • 194
  • 3
  • 13

1 Answers1

1

I wrote a little function to check each solution and return TRUE or FALSE based on your requirements. Passing your solutions to this using sapply() will give you a logical vector, with which you can subset solutions to retain only those that met the requirements.

check_solution <- function(solution, data) {
  data <- data[as.logical(solution),]
  sum(data[["Neighborhood A"]]) >= 3 && sum(data[["Neighborhood B"]]) >= 3
}
### No need for function to test whether `sum(solution) >= 3`, since 
### this will *always* be true if either neighborhood sums is >= 3.

tests <- sapply(solutions, check_solution, data = data)
# FALSE FALSE FALSE FALSE FALSE

solutions[tests]
# list()

### none of the `solutions` provided actually meet criteria

Edit: OP asked in the comments how to test against all neighborhoods in the data, and return TRUE if a specified number of neighborhoods have enough kids. Below is a solution using dplyr.

library(dplyr)

data <- data.frame(
  c("Kelly", "Andrew", "Josh", "Gina", "Patty"), 
  c(1, 1, 0, 1, 0), 
  c(0, 1, 1, 1, 0),
  c(1, 1, 1, 0, 1),
  c(0, 1, 1, 1, 1)
)
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B", "Neighborhood C", 
                 "Neighborhood D")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1), 
                  c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))

check_solution <- function(solution, 
                           data, 
                           min_kids = 3, 
                           min_neighborhoods = NULL) {
  neighborhood_tests <- data %>% 
    filter(as.logical(solution)) %>% 
    summarize(across(starts_with("Neighborhood"), ~ sum(.x) >= min_kids)) %>% 
    as.logical()
  # require all neighborhoods by default
  if (is.null(min_neighborhoods)) min_neighborhoods <- length(neighborhood_tests)
  sum(neighborhood_tests) >= min_neighborhoods
}

tests1 <- sapply(solutions, check_solution, data = data)
solutions[tests1]
# list()

tests2 <- sapply(
  solutions, 
  check_solution, 
  data = data, 
  min_kids = 2, 
  min_neighborhoods = 3
)
solutions[tests2]
# [[1]]
# [1] 1 0 0 1 1
# 
# [[2]]
# [1] 0 1 0 1 1
zephryl
  • 14,633
  • 3
  • 11
  • 30
  • @Kathy no prob! I did see your comments asking about testing all neighborhoods in the dataset, and only requiring a certain number of neighborhoods, and hacked something together for that in my updated answer. I found it easier to use `dplyr` for the updated function. I also added a `min_kids` arg just for the heck of it (which defaults to 3). – zephryl Feb 25 '22 at 21:51
  • 1
    Appreciate it! Haha didn't want to make you do the extra work but that's really helpful (even the min_kids arg) - thanks so much :) – Kathy Feb 26 '22 at 00:46