0

I have a dataframe I'm splitting by grouping and then running a function on each of the grouped portions with do(). The problem I'm having is that there is a variable inside the function that needs to change based on each different group. How can I account for this?

The data is grouped by region and league..

Account  Region  League  Owner  Value
Acc1     East    Major   Sally  1536
Acc2     East    Minor   Jeff   2200
Acc3     East    Minor   Larry  3320
Acc4     West    Major   Harry  4000
Acc5     West    Major   Harry  900
Acc6     West    Minor   Jess   700

East Major
East Minor
West Major
West Minor ... etc

This is part of the function I would pass through to each grouped version of the data.

reAssign <- function(dta) {
  other_acct <- dta %>% 
    group_by(Owner) %>% 
    mutate(NewOwner = replace(Owner, cumsum(AccValue) > 600000 | row_number() > 14, NA)) %>% 
    ungroup(Owner) %>%
    mutate(Owner = NewOwner) %>%
    select(-r, -NewOwner)

After that grouping by Region, League, it is passed the function and inside the function it groups it by Owner. Inside this function below I need to pass a different value of cumsum(AccValue) > 600000 | row_number() > 14. The 600000 AccValue, and the # of accounts needs to change based on which group it is. I have another df which details all of this out..

RegionLeague  MaxValue   MaxCount
East Major    600000     14
East Minor    450000     10
West Major    800000     20
West Minor    220000     12

How can I change the

mutate(NewOwner = replace(Owner, cumsum(AccValue) > 600000 | row_number() > 14, NA)) %>%

To be

mutate(NewOwner = replace(Owner, cumsum(AccValue) > MaxValue | row_number() > MaxCount, NA)) %>%

And pass the correct list through to each of the MaxValue and MaxCount variables?

Matt W.
  • 3,692
  • 2
  • 23
  • 46
  • If you want to keep with pipeable code , you could look at substituting the `dplyr::do` for `purrr:map`, particularly `purrr::pmap()` this would allow you to pass multiple lists into your function as parameters. If not, you could use `mapply` methods. – Jake Kaupp Feb 08 '17 at 15:27
  • 2
    I would join your second df onto your first, and then you can simply use `MaxValue` directly. – Axeman Feb 08 '17 at 15:46

0 Answers0