I want to spread
this data below (first 12 rows shown here only) by the column 'Year', returning the sum of 'Orders' grouped by 'CountryName'. Then calculate the % change in 'Orders' for each 'CountryName' from 2014 to 2015.
CountryName Days pCountry Revenue Orders Year
United Kingdom 0-1 days India 2604.799 13 2014
Norway 8-14 days Australia 5631.123 9 2015
US 31-45 days UAE 970.8324 2 2014
United Kingdom 4-7 days Austria 94.3814 1 2015
Norway 8-14 days Slovenia 939.8392 3 2014
South Korea 46-60 days Germany 1959.4199 15 2014
UK 8-14 days Poland 1394.9096 6. 2015
UK 61-90 days Lithuania -170.8035 -1 2015
US 8-14 days Belize 1687.68 5 2014
Australia 46-60 days Chile 888.72 2. 0 2014
US 15-30 days Turkey 2320.7355 8 2014
Australia 0-1 days Hong Kong 672.1099 2 2015
I can make this work with a smaller test dataframe, but can only seem to return endless errors like 'sum not meaningful for factors' or 'duplicate identifiers for rows' with the full data. After hours of reading the dplyr docs and trying things I've given up. Can anyone help with this code...
data %>%
spread(Year, Orders) %>%
group_by(CountryName) %>%
summarise_all(.funs=c(Sum='sum'), na.rm=TRUE) %>%
mutate(percent_inc=100*((`2014_Sum`-`2015_Sum`)/`2014_Sum`))
The expected output would be a table similar to below. (Note: these numbers are for illustrative purposes, they are not hand calculated.)
CountryName percent_inc
UK 34.2
US 28.2
Norway 36.1
... ...
Edit
I had to make a few edits to the variable names, please note.