Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
15
votes
3 answers

ddply with lm() function

How can I use ddply function for linear model? x1 <- c(1:10, 1:10) x2 <- c(1:5, 1:5, 1:5, 1:5) x3 <- c(rep(1,5), rep(2,5), rep(1,5), rep(2,5)) set.seed(123) y <- rnorm(20, 10, 3) mydf <- data.frame(x1, x2, x3, y) require(plyr) ddply(mydf, mydf$x3,…
jon
  • 11,186
  • 19
  • 80
  • 132
15
votes
1 answer

Can `ddply` (or similar) do a sliding window?

Something like sliding = function(df, n, f) ldply(1:(nrow(df) - n + 1), function(k) f(df[k:(k + n - 1), ]) ) That would be used like > df n a 1 1 0.8021891 2 2 0.9446330 ... > sliding(df, 2, function(df) with(df, + …
Owen
  • 38,836
  • 14
  • 95
  • 125
15
votes
2 answers

What is the dplyr equivalent of plyr::ldply(tapply) in R?

Ultimately, I am trying to achieve something similar to the following, but leveraging dplyr instead of plyr: library(dplyr) probs = seq(0, 1, 0.1) plyr::ldply(tapply(mtcars$mpg, mtcars$cyl, function(x) {…
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
15
votes
2 answers

Using ddply to apply a function to a group of rows

I use ddply quite a bit but I do not consider myself an expert. I have a data frame (df) with grouping variable "Group" which has values of "A", "B" and "C" and the variable to summarize, "Var" has numeric values. If I use ddply(df, .(Group),…
Joseph Kreke
  • 667
  • 1
  • 7
  • 18
14
votes
5 answers

Aggregating sub totals and grand totals with data.table

I've got a data.table in R: library(data.table) set.seed(1) DT = data.table( group=sample(letters[1:2],100,replace=TRUE), year=sample(2010:2012,100,replace=TRUE), v=runif(100)) Aggregating this data into a summary table by group and year is…
Zach
  • 29,791
  • 35
  • 142
  • 201
14
votes
1 answer

R ggplot and facet grid: how to control x-axis breaks

I am trying to plot the change in a time series for each calendar year using ggplot and I am having problems with the fine control of the x-axis. If I do not use scale="free_x" then I end up with an x-axis that shows several years as well as the…
SlowLearner
  • 7,907
  • 11
  • 49
  • 80
14
votes
1 answer

Is there an implementation of Hadley's ddply for python?

I find Hadley's plyr package for R extremely helpful, its a great DSL for transforming data. The problem that is solves is so common, that I face it other use cases, when not manipulating data in R, but in other programming languages. Does anyone…
rafalotufo
  • 3,862
  • 4
  • 25
  • 28
14
votes
3 answers

Error: withCallingHandlers crashing R

I've been using plyr-based function summarySE and ddply for several months without any problem. Today when I ran my extremely basic routine in R some error message showed up and made R crash. Here is an example code and the error I get before R…
dudu
  • 528
  • 5
  • 13
14
votes
1 answer

Is the plyr package for R not available for R version 3.0.2?

I tried installing the plyr package and I got the warning message saying it isn't available for R version 3.0.2. Is this true or is no? If not, why would I be getting this message? I tried using two different CRAN mirrors and both gave the same…
eTothEipiPlus1
  • 577
  • 2
  • 9
  • 28
14
votes
3 answers

How to use error bars on stacked bar with ggplot2

I'm struggling to put error bars into the correct place on a stacked bar. As I read on an earlier post I used ddply in order to stack the error bars. Then that changed the order of the stacking so I ordered the factor. Now it appears the error…
user2055130
  • 396
  • 2
  • 12
13
votes
4 answers

How can I overlay two dense scatter plots so that I can see the outlines of each in R or Matlab?

See this example This was created in matlab by making two scatter plots independently, creating images of each, then using the imagesc to draw them into the same figure and then finally setting the alpha of the top image to 0.5. I would like to do…
Ben J
  • 1,367
  • 2
  • 15
  • 33
13
votes
2 answers

Fastest Tall-Wide pivoting in R

I am dealing with a simple table of the form date variable value 1970-01-01 V1 0.434 1970-01-01 V2 12.12 1970-01-01 V3 921.1 1970-01-02 V1 -1.10 1970-01-03 V3 0.000 1970-01-03 V5 …
gappy
  • 10,095
  • 14
  • 54
  • 73
13
votes
1 answer

Equivalent of transform in R/ddply in Python/pandas?

In R's ddply function, you can compute any new columns group-wise, and append the result to the original dataframe, such as: ddply(mtcars, .(cyl), transform, n=length(cyl)) # n is appended to the df In Python/pandas, I have computed it first, and…
Blaszard
  • 30,954
  • 51
  • 153
  • 233
13
votes
5 answers

Reshape multiple categorical variables to binary response variables

I am trying to convert the following format: mydata <- data.frame(movie = c("Titanic", "Departed"), actor1 = c("Leo", "Jack"), actor2 = c("Kate", "Leo")) movie actor1 actor2 1 Titanic Leo …
ignorant
  • 1,390
  • 1
  • 10
  • 14
13
votes
3 answers

cumsum using ddply

I need to use group by in levels with ddply or aggregate if that's easier. I am not really sure how to do this as I need to use cumsum as my aggregate function. This is what my data looks like: level1 level2 hour product A tea …
Roshini
  • 703
  • 2
  • 8
  • 21