2

I have a dataframe, and I want to retain the rows (fruits) for which any of the price columns is greater than a certain value.

Here is a reproducible example that you can copy&paste directly into R:

fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col

df = as.data.frame(cbind(fruit,ID,price1,price2,price3)) #combine into a dataframe

price_threshold = 10 #define a price

I want to get only the fruits for which any of the prices is greater than 10, which are Banana and Berry in this case

The output I'm expecting is the following two rows:

banana 4563 10  9  11
berry  3235 20  2   8

I tried something like this:

output = df[which(df[,3:5] > price_threshold),]

but it didn't work.

this is close to this post, but here I want to look at any of the values in the last three columns, not just one column.

Any suggestion?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Bayram Sarilmaz
  • 103
  • 1
  • 3
  • 13

4 Answers4

2

One line and readable solution.

df[pmax(df$price1, df$price2, df$price3) > 10, ]
Sam
  • 644
  • 4
  • 21
0

First, it is better to initialize your data.frame as

df = data.frame(fruit,ID,price1,price2,price3)

So the variables are not parsed to factors. You can then get your expected outcome with:

df[rowSums(df[,3:5] > price_threshold)>0,]

Result:

   fruit   ID price1 price2 price3
3 banana 4563     10      9     11
4  berry 3235     20      2      8

Hope this helps!

Florian
  • 24,425
  • 4
  • 49
  • 80
  • Thanks, this works. Can I ask what's the role of ">0" that you added in df[rowSums(df[,3:5] > price_threshold)>0,] ? – Bayram Sarilmaz Jul 26 '18 at 13:55
  • @BayramSarilmaz `rowSums(df[,3:5] > price_threshold)` returns the sum per row of how many of the entries in that row are larger than the price threshold. We only want to keep the rows where at least one of the values satisfies this condition, so therefore we do `>0`, or equivalently `>=1`. – Florian Jul 26 '18 at 13:56
  • This solves my problem, and is a versatile and practical answer. – Bayram Sarilmaz Jul 26 '18 at 13:59
  • Any idea how to modify this command such that it outputs rows for which at least two price columns are greater than the price threshold? in this case the expected output would be the "banana" row only. – Bayram Sarilmaz Aug 13 '18 at 06:23
0

All your columns in data frame df are factors, thats why it doesnt work. No need to use cbind()

fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col

df <- data.frame(fruit, ID, price1, price2, price3)
df
#    fruit   ID price1 price2 price3
# 1  apple  123      3      5      4
# 2 orange 3453      5      7      1
# 3 banana 4563     10      9     11
# 4  berry 3235     20      2      8

df[which(df[,3] > price_thres | df[,4] > price_thres | df[,5] > price_thres),]
#    fruit   ID price1 price2 price3
# 3 banana 4563     10      9     11
# 4  berry 3235     20      2      8
Sowmya S. Manian
  • 3,723
  • 3
  • 18
  • 30
0

Programmatic solution inspired by Sam:

price_cols <- grep("^price", names(df), value = TRUE)
price_cols
[1] "price1" "price2" "price3"

df[do.call(pmax, df[price_cols]) > price_threshold, ]
   fruit   ID price1 price2 price3
3 banana 4563     10      9     11
4  berry 3235     20      2      8
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
s_baldur
  • 29,441
  • 4
  • 36
  • 69