1

I have this dataframe:

x <- c(0,55,105,165,270,65,130,155,155,225,250,295,
     30,100,110,135,160,190,230,300,30,70,105,170,
     210,245,300,0,85,175,300,15,60,90,90,140,210,
     260,270,295,5,55,55,90,100,140,190,255,285,270)

y <- c(305,310,305,310,310,260,255,265,285,280,250,
     260,210,240,225,225,225,230,210,215,160,190,
     190,175,160,160,170,120,135,115,110,85,90,90,
     55,55,90,85,50,50,25,30,5,35,15,0,40,20,5,150)

z <- c(870,793,755,690,800,800,730,728,710,780,804,
     855,813,762,765,740,765,760,790,820,855,812,
     773,812,827,805,840,890,820,873,875,873,865,
     841,862,908,855,850,882,910,940,915,890,880,
     870,880,960,890,860,830)

dati5 <- data.frame(x, y, z)

I want to delete dataframe's rows that contain maximum or minimum values ​​of the variables x and y. I want also to keep these rows ​​so I can use it later. How can I do that?

PS in this case i want to delete all the rows that contain: x == 0 or x== 300 or y ==0 or y== 310

Lince202
  • 143
  • 10
  • @nrussell i tried to modify vectors...but they are not connected each other... so i thought it's better working with dataframes...but i don't know how... – Lince202 Mar 30 '16 at 14:12
  • @bouncyball where do you implement this? – Lince202 Mar 30 '16 at 14:15
  • what do you want to do if you have multiple maximums? – Jav Mar 30 '16 at 14:16
  • @JavK deleting all.. for example...in x vector the minimun is 0... and it's multiple...i want to delete both the rows containing this. – Lince202 Mar 30 '16 at 14:18
  • if you know you want to filter by a certain value, in this case 0, you can also try this: `dati5 <- dati5[dati5$x > 0,]` – s_scolary Mar 30 '16 at 14:28
  • Are you creating four separate cases? min x, max x, min y, and max y? Or are you looking to solve all four in one go? – Pierre L Mar 30 '16 at 14:29

3 Answers3

3
dati5[!(dati5$x %in% max(dati5$x)),]

This will return you dataframe with all rows where values of x matches maximum of x, deleted.

The same expression without negative !, will show you rows that where deleted:

dati5[(dati5$x %in% max(dati5$x)),]
    x   y   z
20 300 215 820
27 300 170 840
31 300 110 875

Do the same for min and y.

Edit: As Laterow noted: %in% is not needed here.

dati5[dati5$x != max(dati5$x),]

Also:

Given that you have x stored as vector, simplly comparing via vectror will also work:

dati5[x == max(x),]

EDIT2:

As for comments of four seperate calls, they can be all done with single command as well:

dati5[!(dati5$x %in% c(max(dati5$x), min(dati5$x))) | !(dati5$y %in% c(max(dati5$y), min(dati5$y))),]

What is being deleted:

dati5[(dati5$x %in% c(max(dati5$x), min(dati5$x))) | (dati5$y %in% c(max(dati5$y), min(dati5$y))),]
     x   y   z
1    0 305 870
2   55 310 793
4  165 310 690
5  270 310 800
20 300 215 820
27 300 170 840
28   0 120 890
31 300 110 875
46 140   0 880

max/min of each x and y

Jav
  • 2,203
  • 13
  • 22
  • 2
    Replace `!( %in% )` with `!=`, as in `dati5$x != max(dati5$x)`. It's shorter and clearer. – slamballais Mar 30 '16 at 14:24
  • Yes, just realised it – Jav Mar 30 '16 at 14:26
  • With this method you will end up creating four different calls. max x, min x, max y, min y – Pierre L Mar 30 '16 at 14:28
  • how can i delete these four calls from my initial data? – Lince202 Mar 30 '16 at 14:31
  • That's the problem with the solution. You will then have to create four more for deletes. 8 functions in all. @Lince202 you should specify whether you need to search for just maximums sometimes and just minimums other times. Or just column x sometimes and column y others. If you need all max and min for x and y together you need to say that. – Pierre L Mar 30 '16 at 14:33
  • @PierreLafortune i need to delete all max and all min of x and y at the same time. – Lince202 Mar 30 '16 at 14:39
  • See update, it should fix worries of @PierreLafortune – Jav Mar 30 '16 at 14:50
  • @JavK it's perfect!! now how can i subtract `dati5[!(dati5$x %in% c(max(dati5$x), min(dati5$x))) | !(dati5$y %in% c(max(dati5$y), min(dati5$y))),]` to data5?? – Lince202 Mar 30 '16 at 17:26
  • data5 <- dati5[!(dati5$x %in% c(max(dati5$x), min(dati5$x))) | !(dati5$y %in% c(max(dati5$y), min(dati5$y))),] – Jav Mar 31 '16 at 07:03
1

A single line solution that easily works on any number of columns:

dati5[!rowSums(sapply(dati5[-3], function(x) x == max(x) | x == min(x))),]

Explanation:

                                 function(x) x == max(x) | x == min(x)       # Return TRUE if element in vector is max or min
               sapply(dati5[-3],                                      )      # Apply this to dati5 (columns x and y)
       rowSums(                                                        )     # Sum this per row (FALSE = 0, TRUE = 1)
      !                                                                      # Logically negate this (0 = FALSE, above 0 = TRUE)
dati5[                                                                  ,]   # Subset dati5
slamballais
  • 3,161
  • 3
  • 18
  • 29
  • @PierreLafortune I originally deleted it because I feel that JavK's (now single line) answer is more appropriate for OP. I forgot that Stack functions as a museum and that someone may need a general form. – slamballais Mar 30 '16 at 15:11
  • 1
    This is good too. And making an index for direct use and negation is useful – Pierre L Mar 30 '16 at 15:18
0

could this help?

which_minmax <- function(x) which(x == max(x, na.rm=TRUE) | x == min(x, na.rm=TRUE))
remove_ids <- unique(unlist(sapply(dati5[, 1:2], which_minmax)))
# filtered dati5
dati5[-remove_ids, ]
# removed dati5
dati5[remove_ids, ]

and this can serve as a function:

remove_minmax <- function(df, cols_to_filter){
  which_minmax <- function(x) which(x == max(x, na.rm=TRUE) | x == min(x, na.rm=TRUE))
  remove_ids <- unique(unlist(sapply(df[, cols_to_filter], which_minmax)))
  list(filtered=df[-remove_ids, ], removed=df[remove_ids, ])
}
# eg
remove_minmax(dati5, 1:2)
Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38