2

I am using dplyr for most of my data wrangling in R. Yet, I am having a hard time achieving this particular effect. Can't also seem to find the answer by googling either.

Assume I have data like this and what I want to achieve is to sort person-grouped data based on cash value from the year 2021. Below I show the outcome I wish to achieve. I am just missing my imagination on this one I guess. If I only had 2021 value I could simply use ... %>% arrange(desc(cash)) but I am not sure how to follow from here.

    year   person        cash
0   2020   personone     29
1   2021   personone     40
2   2020   persontwo     17
3   2021   persontwo     13
4   2020   personthree   62
5   2021   personthree   55      

And what I want to achieve is to sort this data in descending order based on values from the year 2021. So that the data should look like:

    year   person        cash
0   2020   personthree   62
1   2021   personthree   55
2   2020   personone     29
3   2021   personone     40
4   2020   persontwo     17
5   2021   persontwo     13   
bajun65537
  • 498
  • 3
  • 14
  • Is 2021 always the 2nd year per person? In other words, is the 2nd year the relevant moment or is 2021? – Jon Spring Jan 20 '22 at 19:17
  • Would rock to be able to do it by 2021. But sorting every second is easier, than sure. I already tried `arrange(desc(cash[seq(1, length(cash), 2)]))` but got an error saying that ``..1` must be size 6 or 1, not 3`. – bajun65537 Jan 20 '22 at 19:23

2 Answers2

3

One approach using a join:

df %>%
  filter(year == 2021) %>%
  # group_by(person) %>% slice(2) %>% ungroup() %>%  #each person's yr2
  arrange(-cash) %>%
  select(-cash, -year) %>%
  left_join(df)

Output:

       person year cash
1 personthree 2020   62
2 personthree 2021   55
3   personone 2020   29
4   personone 2021   40
5   persontwo 2020   17
6   persontwo 2021   13
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
3

Another option:

library(dplyr)
dat %>%
  group_by(person) %>%
  mutate(maxcash = max(cash)) %>%
  arrange(desc(maxcash)) %>%
  ungroup()
# # A tibble: 6 x 4
#    year person       cash maxcash
#   <int> <chr>       <int>   <int>
# 1  2020 personthree    62      62
# 2  2021 personthree    55      62
# 3  2020 personone      29      40
# 4  2021 personone      40      40
# 5  2020 persontwo      17      17
# 6  2021 persontwo      13      17

Or a one-liner, using base R as a helper:

dat %>%
  arrange(-ave(cash, person, FUN = max))
#   year      person cash
# 4 2020 personthree   62
# 5 2021 personthree   55
# 0 2020   personone   29
# 1 2021   personone   40
# 2 2020   persontwo   17
# 3 2021   persontwo   13

Edit:

If instead of max you mean "always 2021's data", then:

dat %>%
  group_by(person) %>%
  mutate(cash2021 = cash[year == 2021]) %>%
  arrange(desc(cash2021)) %>%
  ungroup()
# # A tibble: 6 x 4
#    year person       cash cash2021
#   <int> <chr>       <int>    <int>
# 1  2020 personthree    62       55
# 2  2021 personthree    55       55
# 3  2020 personone      29       40
# 4  2021 personone      40       40
# 5  2020 persontwo      17       13
# 6  2021 persontwo      13       13
r2evans
  • 141,215
  • 6
  • 77
  • 149