0

I have sales data for different vendors by year, product, and region. I want to compute the year-over-year in vendor market share. Here's a sample data frame named test:

Year Product Region Vendor Sales
2018 Planes Americas A 7
2018 Trains EMEA A 29
2018 Automobiles APAC B 64
2018 Apples Americas B 23
2018 Oranges EMEA C 72
2018 Bananas APAC C 90
2018 Hardware Americas A 61
2018 Software EMEA A 3
2018 Services APAC B 16
2018 Planes Americas B 64
2018 Trains EMEA C 1
2018 Automobiles APAC C 47
2018 Apples Americas A 22
2018 Oranges EMEA A 63
2018 Bananas APAC B 11
2018 Hardware Americas B 84
2018 Software EMEA C 95
2018 Services APAC C 41
2019 Planes Americas A 77
2019 Trains EMEA A 80
2019 Automobiles APAC B 48
2019 Apples Americas B 67
2019 Oranges EMEA C 43
2019 Bananas APAC C 96
2019 Hardware Americas A 51
2019 Software EMEA A 69
2019 Services APAC B 11
2019 Planes Americas B 78
2019 Trains EMEA C 50
2019 Automobiles APAC C 84
2019 Apples Americas A 41
2019 Oranges EMEA A 3
2019 Bananas APAC B 83
2019 Hardware Americas B 61
2019 Software EMEA C 99
2019 Services APAC C 45

For each row, I compute the total sales as:

test2 <- test %>%
   group_by(Year, Product) %>%
   summarize(Total_Sales = sum(Sales))

Then I join the two data frames and compute vendor market share for each row:

test3 <- test %>%
   left_join(test2) %>%
   mutate(Market_Share = Sales / Total_Sales)

This all works as intended.

I have made sure my Year column is class Date:

test3 <- test3 %>%
   mutate(Year = as.Date(ISOdate(Year, 12, 31)))

What I now want to do is calculate year-over-year change in Market Share. To do this, I want to look up the value of Market Share for the same combination of Product, Region, and Vendor for the prior value of Year in the table and add it as a new column PY_Market_Share.

I assume I can do this with lubridate, but I can't figure out how.

EPBaron
  • 11
  • 3

1 Answers1

1

Here is an option to do this using the lag function

library(dplyr)
test3 %>% group_by(Product, Region , Vendor) %>%
  summarise(Market_Share = Market_Share,
            PY_Market_Share = lag(Market_Share, order_by = Year),
            Delta_PY = Market_Share - lag(Market_Share, order_by = Year),
            Year = Year)
             
# A tibble: 36 x 7
# Groups:   Product, Region, Vendor [18]
   Product     Region   Vendor Market_Share PY_Market_Share Delta_PY  Year
   <chr>       <chr>    <chr>         <dbl>           <dbl>    <dbl> <int>
 1 Apples      Americas A             0.489          NA       NA      2018
 2 Apples      Americas A             0.380           0.489   -0.109  2019
 3 Apples      Americas B             0.511          NA       NA      2018
 4 Apples      Americas B             0.620           0.511    0.109  2019
 5 Automobiles APAC     B             0.577          NA       NA      2018
 6 Automobiles APAC     B             0.364           0.577   -0.213  2019
 7 Automobiles APAC     C             0.423          NA       NA      2018
 8 Automobiles APAC     C             0.636           0.423    0.213  2019
 9 Bananas     APAC     B             0.109          NA       NA      2018
10 Bananas     APAC     B             0.464           0.109    0.355  2019
# ... with 26 more rows

PY_Market_Share gives you the last year Market_Share by Product/Region/Vendor. But you can also directly calculate the change in Market_Share from the previous year see column Delta_PY

Bushidov
  • 713
  • 4
  • 16
  • Thank you @Bushidov! I wasn't familiar with the lag() function, but it seems to do what I'm looking for. It's better than my latest version, where I pivot Sales by the Year column using pivot_wider() and then create more calculated columns with mutate(). Thanks again! – EPBaron Sep 06 '22 at 03:32