Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
39
votes
8 answers

meaning of ddply error: 'names' attribute [9] must be the same length as the vector [1]

I'm going through Machine Learning for Hackers, and I am stuck at this line: from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) Which generates the following error: Error in attributes(out) <- attributes(col) : …
mota
  • 5,275
  • 5
  • 34
  • 44
38
votes
4 answers

Fastest way to add rows for missing time steps?

I have a column in my datasets where time periods (Time) are integers ranging from a-b. Sometimes there might be missing time periods for any given group. I'd like to fill in those rows with NA. Below is example data for 1 (of several 1000)…
Maiasaura
  • 32,226
  • 27
  • 104
  • 108
35
votes
1 answer

doing a plyr operation on every row of a data frame in R

I like the plyr syntax. Any time I have to use one of the *apply() commands I end up kicking the dog and going on a 3 day bender. So for the sake of my dog and my liver, what's concise syntax for doing a ddply operation on every row of a data…
JD Long
  • 59,675
  • 58
  • 202
  • 294
35
votes
4 answers

Sum of rows based on column value

I want to sum rows that have the same value in one column: > df <- data.frame("1"=c("a","b","a","c","c"), "2"=c(1,5,3,6,2), "3"=c(3,3,4,5,2)) > df X1 X2 X3 1 a 1 3 2 b 5 3 3 a 3 4 4 c 6 5 5 c 2 2 For one column (X2), the data can…
R-obert
  • 999
  • 3
  • 10
  • 17
32
votes
6 answers

Mean of elements in a list of data.frames

Suppose I had a list of data.frames (of equal rows and columns) dat1 <- as.data.frame(matrix(rnorm(25), ncol=5)) dat2 <- as.data.frame(matrix(rnorm(25), ncol=5)) dat3 <- as.data.frame(matrix(rnorm(25), ncol=5)) all.dat <- list(dat1=dat1, dat2=dat2,…
ChrisC
  • 466
  • 1
  • 5
  • 10
31
votes
1 answer

Standard error bars using stat_summary

The following code produces bar plots with standard error bars using Hmisc, ddply and ggplot: means_se <- ddply(mtcars,.(cyl), function(df) smean.sdl(df$qsec,mult=sqrt(length(df$qsec))^-1)) colnames(means_se) <-…
aleph4
  • 708
  • 1
  • 8
  • 15
30
votes
3 answers

Efficient alternatives to merge for larger data.frames R

I am looking for an efficient (both computer resource wise and learning/implementation wise) method to merge two larger (size>1 million / 300 KB RData file) data frames. "merge" in base R and "join" in plyr appear to use up all my memory…
Etienne Low-Décarie
  • 13,063
  • 17
  • 65
  • 87
29
votes
1 answer

doMC vs doSNOW vs doSMP vs doMPI: why aren't the various parallel backends for 'foreach' functionally equivalent?

I've got a few test pieces of code that I've been running on various machines, always with the same results. I thought the philosophy behind the various do... packages was that they could be used interchangeably as a backend for foreach's %dopar%. …
Zach
  • 29,791
  • 35
  • 142
  • 201
27
votes
3 answers

How to merge two data frames on common columns in R with sum of others?

R Version 2.11.1 32-bit on Windows 7 I got two data sets: data_A and data_B: data_A USER_A USER_B ACTION 1 11 0.3 1 13 0.25 1 16 0.63 1 17 0.26 2 11 0.14 2 14 0.28 data_B USER_A USER_B ACTION 1 …
PepsiCo
  • 1,399
  • 4
  • 13
  • 18
25
votes
4 answers

Trouble converting long list of data.frames (~1 million) to single data.frame using do.call and ldply

I know there are many questions here in SO about ways to convert a list of data.frames to a single data.frame using do.call or ldply, but this questions is about understanding the inner workings of both methods and trying to figure out why I can't…
wahalulu
  • 1,447
  • 2
  • 17
  • 23
25
votes
7 answers

Group by multiple columns and sum other multiple columns

I have a data frame with about 200 columns, out of them I want to group the table by first 10 or so which are factors and sum the rest of the columns. I have list of all the column names which I want to group by and the list of all the cols which I…
user1042267
  • 303
  • 1
  • 3
  • 8
25
votes
4 answers

Is there a R function that applies a function to each pair of columns?

I often need to apply a function to each pair of columns in a dataframe/matrix and return the results in a matrix. Now I always write a loop to do this. For instance, to make a matrix containing the p-values of correlations I write: df <-…
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
25
votes
5 answers

Is there an alternative to "revalue" function from plyr when using dplyr?

I'm a fan of the revalue function is plyr for substituting strings. It's simple and easy to remember. However, I've migrated new code to dplyr which doesn't appear to have a revalue function. What is the accepted idiom in dplyr for doing things…
fmark
  • 57,259
  • 27
  • 100
  • 107
24
votes
4 answers

Remove group from data.frame if at least one group member meets condition

I have a data.frame where I'd like to remove entire groups if any of their members meets a condition. In this first example, if the values are numbers and the condition is NA the code below works. df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1,…
nofunsally
  • 2,051
  • 6
  • 35
  • 53
24
votes
2 answers

ddply for sum by group in R

I have a sample dataframe "data" as follows: X Y Month Year income 2281205 228120 3 2011 1000 2281212 228121 9 2010 1100 2281213 228121 12 2010 900 2281214 228121 3 2011 9000 2281222 228122 6 2010 …
Metrics
  • 15,172
  • 7
  • 54
  • 83