Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
18
votes
6 answers

using predict with a list of lm() objects

I have data which I regularly run regressions on. Each "chunk" of data gets fit a different regression. Each state, for example, might have a different function that explains the dependent value. This seems like a typical "split-apply-combine" type…
JD Long
  • 59,675
  • 58
  • 202
  • 294
18
votes
4 answers

Faster ways to calculate frequencies and cast from long to wide

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values. Example of what I've tried so far (tried a bunch of other…
user592419
  • 5,103
  • 9
  • 42
  • 67
18
votes
3 answers

zipping lists in R

As a guideline I prefer apply functions on elements of a list using lapply or *ply (from plyr) rather than explicitly iterating through them. However, this works well when I have to process one list at a time. When the function takes multiple…
gappy
  • 10,095
  • 14
  • 54
  • 73
18
votes
3 answers

Understanding ddply error message - argument "by" is missing, with no default

I am trying to figure out why I am getting an error message when using ddply. Example data: data<-data.frame(area=rep(c("VA","OC","ES"),each=4), sex=rep(c("Male","Female"),each=2,times=3), year=rep(c(2009,2010),times=6), …
user41509
  • 978
  • 1
  • 10
  • 31
18
votes
2 answers

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

Note: The title of this question has been edited to make it the canonical question for issues when plyr functions mask their dplyr counterparts. The rest of the question remains unchanged. Suppose I have the following data: dfx <- data.frame( …
Ignacio
  • 7,646
  • 16
  • 60
  • 113
17
votes
2 answers

Create an "index" for each element of a group with data.table

My data is grouped by the IDs in V6 and ordered by position (V1:V3): dt V1 V2 V3 V4 V5 V6 1: chr1 3054233 3054733 . + ENSMUSG00000090025 2: chr1 3102016 3102125 . + ENSMUSG00000064842 3: chr1 3205901 3207317 .…
fridaymeetssunday
  • 1,118
  • 1
  • 21
  • 31
17
votes
2 answers

What is purpose of dot before variables (i.e. "variables") in the R Plyr package?

What is purpose of dot before variables (i.e. "variables") in the R Plyr package? for instance, from the R help file: ddply(.data, .variables, .fun = NULL, ..., .progress = "none", .drop = TRUE, .parallel = FALSE) Any assistance would be…
MikeTP
  • 7,716
  • 16
  • 44
  • 57
17
votes
6 answers

How to fill NA with median?

Example data: set.seed(1) df <- data.frame(years=sort(rep(2005:2010, 12)), months=1:12, value=c(rnorm(60),NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) head(df) years months value 1 2005 1 -0.6264538 2 2005…
Sheridan
  • 615
  • 1
  • 8
  • 21
16
votes
2 answers

Checking if an r package is currently attached

I am having trouble with my workflow because I am sourcing multiple scripts in rmarkdown, some of which require the package dplyr and some of which use plyr. The problem is that the rename function exists in both packages and if dplyr is currently…
llewmills
  • 2,959
  • 3
  • 31
  • 58
16
votes
5 answers

ggplot2 fails to install on R 3.0.2

I am unable to install ggplot2 in R 3.0.2 on Ubuntu. When I run install.packages('ggplot2',dependencies = TRUE) I get the following error. > install.packages('ggplot2',dependencies = TRUE) Installing package into…
gnjago
  • 3,391
  • 5
  • 19
  • 17
16
votes
3 answers

scale/normalize columns by group

I have a data frame that looks like this: Store Temperature Unemployment Sum_Sales 1 1 42.31 8.106 1643691 2 1 38.51 8.106 1641957 3 1 39.93 8.106 1611968 4 1 46.63 8.106 …
itjcms18
  • 3,993
  • 7
  • 26
  • 45
16
votes
4 answers

ddply multiple quantiles by group

how can I do this calculation: library(ddply) quantile(baseball$ab) 0% 25% 50% 75% 100% 0 25 131 435 705 by groups, say by "team"? I want a data.frame with rownames "team" and column names "0% 25% 50% 75% 100%", i.e. one quantile…
Florian Oswald
  • 5,054
  • 5
  • 30
  • 38
16
votes
4 answers

Compute rolling sum by id variables, with missing timepoints

I'm trying to learn R and there are a few things I've done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data: id class t count desired -- ----- ---------- ----- ------- 1 A …
ADJ
  • 4,892
  • 10
  • 50
  • 83
16
votes
2 answers

Merge Rows within Data Frame

I have a relational dataset, where I'm looking for dyadic information. I have 4 columns. Sender, Receiver, Attribute, Edge I'm looking to take the repeated Sender -- Receiver counts and convert them as additional edges. df <- data.frame(sender =…
crock1255
  • 1,025
  • 2
  • 12
  • 23
15
votes
2 answers

How does one aggregate and summarize data quickly?

I have a dataset whose headers look like so: PID Time Site Rep Count I want sum the Count by Rep for each PID x Time x Site combo on the resulting data.frame, I want to get the mean value of Count for PID x Time x Site combo. Current function is as…
Maiasaura
  • 32,226
  • 27
  • 104
  • 108