-1

I have a data frame with observations (rows) and variables (columns). I now want to create a subset of observations who meet certain criteria. The relevant variables all start with "incl_", they are numbered from 1 to 5 and equal 1 if fulfilled (e.g., "incl_1" = 1).

Is there a more efficient way of doing the following:

redcap <- df[df$incl_1 == 1 &df$incl_2 == 1 & df$incl_3 == 1 & df$incl_4 == 1 & df$incl_5,]

I tried this which obviously does not work, but might give you an idea of what I am trying to do:

df2 <- filter(df, select(df, startswith("incl")) ==1)

  • 1
    Hi @theflowingflo. Please edit your question to include a sample of your data, ideally the output from running `dput(df)`. To make the selection criteria clear, you could include a few example rows of what the filtering should keep from `df`. – Seth Aug 15 '23 at 14:24

1 Answers1

0
library(tidyverse)

# identify the columns of interest
(nms <- names(iris)[startsWith(
  names(iris),
  "Sepal"
)])

(rows_to_keep <- mutate(iris, across(
  nms,
  \(x) between(x, 3, 5) #some rule that all the columns of interest should satisfy like ==1 or between 3 and 5
),
rowid = row_number()
) |>
  rowwise() |>
  mutate(
    ptest = all(!!!syms(nms))
  ) |> filter(ptest) |> select(rowid))

# do the filter 
iris |> mutate(rowid = row_number()) |> 
  inner_join(rows_to_keep) |> select(-rowid)
Nir Graham
  • 2,567
  • 2
  • 6
  • 10