1

I would like to get the n highest and n lowest rows based on a variable in a dataframe. For example, for the 5 highest and 5 lowest rows of mpg in the mtcars dataframe, the output would be

#>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1

I created this output with the following code, however this relies in selecting the right cutoff values for mpg, but I want it to select automatically the highest & lowest x rows.

 library(tidyverse)
    mtcars %>% arrange(mpg) %>% filter(!between(mpg, 15,26))

This is somewhat similar to this question: Extract rows with highest and lowest values from a data frame However the answers there are kind of hard to understand, and I was wondering whether there's a better way that does not involve using slice_min() and slice_max() to create two separate dataframes and then merging them.

Edited to add: if there was a way of using mutate to create a variable which indicates the top 5 lowest and highest numbers like this, that would be great:

#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb rank
#> 1  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  low
#> 2  10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4  low
#> 3  13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4  low
#> 4  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4  low
#> 5  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4  low
#> 6  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 high
#> 7  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 high
#> 8  30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 high
#> 9  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 high
#> 10 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 high
Esther
  • 441
  • 2
  • 15

1 Answers1

1

One option could be:

mtcars %>%
 arrange(mpg) %>%
 slice(c(1:5, (n()-4):n()))

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
tmfmnk
  • 38,881
  • 4
  • 47
  • 67