Return an entire row if the value in any specific set of columns meets a certain criteria

Question

I have a dataframe, and I want to retain the rows (fruits) for which any of the price columns is greater than a certain value.

Here is a reproducible example that you can copy&paste directly into R:

fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col

df = as.data.frame(cbind(fruit,ID,price1,price2,price3)) #combine into a dataframe

price_threshold = 10 #define a price

I want to get only the fruits for which any of the prices is greater than 10, which are Banana and Berry in this case

The output I'm expecting is the following two rows:

banana 4563 10  9  11
berry  3235 20  2   8

I tried something like this:

output = df[which(df[,3:5] > price_threshold),]

but it didn't work.

this is close to this post, but here I want to look at any of the values in the last three columns, not just one column.

Any suggestion?

All your columns in data frame `df` are factors, thats why it doesnt work. No need to use `cbind` — Sowmya S. Manian, Jul 26 '18 at 14:01

Sam · Answer 1 · 2018-07-26T13:55:48.957

2

One line and readable solution.

df[pmax(df$price1, df$price2, df$price3) > 10, ]

edited Jul 26 '18 at 13:55

answered Jul 26 '18 at 13:52

Sam

644
4
21

This is good....Thanks! But if I have more than three price columns it wouldn't be practical. – Bayram Sarilmaz Jul 26 '18 at 13:56
@BayramSarilmaz If you want a more versatile `pmax` solution, you could do `df[do.call(pmax, df[3:5]) > 10, ]` or `df[do.call(pmax, df[grep('price', names(df))]) > 10, ]` – IceCreamToucan Jul 26 '18 at 14:04
Yup. I agree with Ryan. – Sam Jul 26 '18 at 14:12

score 0 · Accepted Answer · answered Jul 26 '18 at 13:50

0

First, it is better to initialize your data.frame as

df = data.frame(fruit,ID,price1,price2,price3)

So the variables are not parsed to factors. You can then get your expected outcome with:

df[rowSums(df[,3:5] > price_threshold)>0,]

Result:

   fruit   ID price1 price2 price3
3 banana 4563     10      9     11
4  berry 3235     20      2      8

Hope this helps!

answered Jul 26 '18 at 13:50

Florian

24,425
4
49
80

Thanks, this works. Can I ask what's the role of ">0" that you added in df[rowSums(df[,3:5] > price_threshold)>0,] ? – Bayram Sarilmaz Jul 26 '18 at 13:55
@BayramSarilmaz `rowSums(df[,3:5] > price_threshold)` returns the sum per row of how many of the entries in that row are larger than the price threshold. We only want to keep the rows where at least one of the values satisfies this condition, so therefore we do `>0`, or equivalently `>=1`. – Florian Jul 26 '18 at 13:56
This solves my problem, and is a versatile and practical answer. – Bayram Sarilmaz Jul 26 '18 at 13:59
Any idea how to modify this command such that it outputs rows for which at least two price columns are greater than the price threshold? in this case the expected output would be the "banana" row only. – Bayram Sarilmaz Aug 13 '18 at 06:23

score 0 · Answer 3 · answered Jul 26 '18 at 14:05

All your columns in data frame df are factors, thats why it doesnt work. No need to use cbind()

fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col

df <- data.frame(fruit, ID, price1, price2, price3)
df
#    fruit   ID price1 price2 price3
# 1  apple  123      3      5      4
# 2 orange 3453      5      7      1
# 3 banana 4563     10      9     11
# 4  berry 3235     20      2      8

df[which(df[,3] > price_thres | df[,4] > price_thres | df[,5] > price_thres),]
#    fruit   ID price1 price2 price3
# 3 banana 4563     10      9     11
# 4  berry 3235     20      2      8

score 0 · Answer 4 · edited Jul 26 '18 at 15:25

0

Programmatic solution inspired by Sam:

price_cols <- grep("^price", names(df), value = TRUE)
price_cols
[1] "price1" "price2" "price3"

df[do.call(pmax, df[price_cols]) > price_threshold, ]
   fruit   ID price1 price2 price3
3 banana 4563     10      9     11
4  berry 3235     20      2      8

edited Jul 26 '18 at 15:25

IceCreamToucan

28,083
2
22
38

answered Jul 26 '18 at 14:13

s_baldur

29,441
4
36
69

Return an entire row if the value in any specific set of columns meets a certain criteria

4 Answers4

Linked