18

I am trying to figure out why I am getting an error message when using ddply.

Example data:

data<-data.frame(area=rep(c("VA","OC","ES"),each=4),
    sex=rep(c("Male","Female"),each=2,times=3),
    year=rep(c(2009,2010),times=6),
    bin=c(110,120,125,125,110,130,125,80,90,90,80,140),
    shell_length=c(.4,4,1,2,.2,5,.4,4,.8,4,.3,4))

bin7<-ddply(data, .(area,year,sex,bin), summarize,n_bin=length(shell_length))

Error message: Error in .fun(piece, ...) : argument "by" is missing, with no default

I got this error message yesterday. I restarted R and reran the code and everything was fine. This morning I got the error message again and restarting R did not solve the problem.

I also tried to run some example code and got the same error message.

  # Summarize a dataset by two variables
require(plyr)
dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

# Note the use of the '.' function to allow
# group and sex to be used without quoting
ddply(dfx, .(group, sex), summarize,
 mean = round(mean(age), 2),
 sd = round(sd(age), 2))

R information

R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] Hmisc_3.17-0        ggplot2_1.0.1       Formula_1.2-1      
 [4] survival_2.38-1     car_2.0-26          MASS_7.3-40        
 [7] xlsx_0.5.7          xlsxjars_0.6.1      rJava_0.9-7        
[10] plyr_1.8.3          latticeExtra_0.6-26 RColorBrewer_1.1-2 
[13] lattice_0.20-31  

If someone could please explain why this is happening I would appreciate it.

Thanks

zx8754
  • 52,746
  • 12
  • 114
  • 209
user41509
  • 978
  • 1
  • 10
  • 31
  • 1
    Not able to reproduce the error using the same version of plyr. Did you loaded `plyr` and `dplyr` at the same time? – akrun Nov 19 '15 at 15:14
  • 1
    I dont get the error. But I named the dataframe `d`. Also the formula works: `ddply(d, ~ area+year+sex+bin, summarize,n_bin=length(shell_length))` – jogo Nov 19 '15 at 15:16
  • 4
    Both `dplyr` and `plyr` have functions named `summarize`. You probably have `dplyr` loaded as well. `detach` both packages and then load `plyr` before loading `dplyr`. – narendra-choudhary Nov 19 '15 at 15:23
  • Thank you for all of your help. I detached both packages and then reloaded them with plyr first. I did not know that the two packages needed to be loaded in a certain order. – user41509 Nov 19 '15 at 15:25
  • 3
    Looks like you loaded `Hmisc` last, which also has a `summarize` function (that has a `by` argument as in your error). Watch those messages when loading packages - you get important info about masking. – aosmith Nov 19 '15 at 15:37
  • Thank you for that information also. After thinking I got it figured out, I reran the code and got the same error message. I ended up loading the plyr package later on in my code and now everything is working. So, if I want to use plyr for the summarizing data I need to have that loaded last in my list of packages loaded? I prefer to load all of my packages at the beginning of my code. This is the first time I have had to deal with this type of issue with masking of different packages. – user41509 Nov 19 '15 at 15:44
  • 2
    you can also specify `plyr::summarize(...)` to be on the safe side – C8H10N4O2 Nov 19 '15 at 16:04
  • why isn't there a hint in the error message that there are several packages loaded having a function with the particular name? – Clem Snide Jan 11 '23 at 14:26

3 Answers3

24

As stated in Narendra's comment to the question, this error can be caused by loading other packages that have a function called summarize (or summarise) that does not work as the function in plyr. For instance:

library(plyr)
library(Hmisc)

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Error in .fun(piece, ...) : argument "by" is missing, with no default

One solution is to call the correct function with :: and the correct namespace:

ddply(iris, "Species", plyr::summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588

Alternatively, one can detach the package that has the wrong function:

detach(package:Hmisc)
ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588

Finally, if one needs both packages and does not want to bother with ::, one can load them in the other order:

library(Hmisc)
library(plyr)

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588
CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
5

I had a similar problem (with a different data set, but same error message), but I discovered that ddplyr used the UK spelling "summarise". Once I made the spelling change, code worked.

Here's the code I used. When I used the "z" spelling, I got the error message Error in .fun(piece, ...) : argument "by" is missing, with no default; but changing to "s" solved it.

library(plyr)
ddply(InsectSprays,.(spray),summarise,sum=sum(count))
CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
1

@CoderGuy123's answer is great, but I want to add one more solution which I prefer to those suggested.

If you want to load both packages that has name conflicts, you can control which specific function is used with simple assignment: summarize <- plyr::summarize.

Example:

library(plyr)
library(Hmisc)

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Error in .fun(piece, ...) : argument "by" is missing, with no default

summarize <- plyr::summarize

ddply(iris, "Species", summarize, mean_sepal_length = mean(Sepal.Length))
#> Species mean_sepal_length
#> 1     setosa             5.006
#> 2 versicolor             5.936
#> 3  virginica             6.588
Joakim
  • 2,092
  • 1
  • 20
  • 23