Dynamically subsetting rowns in R data.table

Question

I have a dataset, which varies in the number of columns depending on the input parameters. For example if the input parameter K=3 I will have members1, members2 and members3 column. I store the column names in a vector: memberCols <- c(paste0("members" , 1:K)).

        policy quotationYear   members1    members2             member1FLAG           member2FLAG 
1: G000809-000          2016      -0.83        0.08                       0                     0
2: G002417-000          2016      -0.62       -0.38    growth out of bounds                     0
3: G005213-000          2016      -0.66       -0.56    growth out of bounds  growth out of bounds
4: G001719-000          2017      19.00        0.00    growth out of bounds                     0
5: G002337-000          2017      -0.86       -0.21                       0                     0
6: G002337-000          2017       6.67        0.25    growth out of bounds                     0

I am interested to look at only some of the rows. Specifically I want to look at those columns that contain "out of bounds" in the memberCols. So far I know how to dynamically subset the columns.

growthCols<- c(paste0("growthMembers" , 1:K, "FLAG"))
    BToutliersGrowth <- BTplan[,.SD, .SDcols = c(growthCols, memberCols)]

How can I subset the data.table such that only the rows will be kept that contain "growth out of bounds" in any of the growthCols?

DATA:

data: structure(list(policy = c("G000809-000", "G002417-000", "G005213-000", 
"G001719-000", "G002337-000", "G002337-000"), quotationYear = c(2016, 
2016, 2016, 2017, 2017, 2017), members1 = c(-0.83, 
-0.62, -0.66, 19, -0.86, 6.67), members2 = c(0.08, -0.38, 
-0.56, 0, -0.21, 0.25), growthMembers1FLAG = c("growth out of bounds", 
"growth out of bounds", "growth out of bounds", "growth out of bounds", 
"growth out of bounds", "growth out of bounds"), growthMembers2FLAG = c("0", 
"0", "growth out of bounds", "0", "0", "0")), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7fcec80164e0>)

score 1 · Accepted Answer · answered Jun 26 '20 at 05:42

1

You can try :

library(data.table)
data[data[,rowSums(.SD == 'growth out of bounds') > 0,  .SDcols = growthCols]]

answered Jun 26 '20 at 05:42

Ronak Shah

377,200
20
156
213

i am affraid this is working correctly, i get 0 rows despite teh fact that there are out of bounds observations, is there an alternative? – Nneka Jun 26 '20 at 05:47
for the data you have shared I get all the rows selected since all the rows have "growth out of bounds". you get 0 rows for that data? – Ronak Shah Jun 26 '20 at 05:51
i was subsetting on the wrong columns, sorry! – Nneka Jun 26 '20 at 06:04

Dynamically subsetting rowns in R data.table

1 Answers1