Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

Split a data structure (data frame, list, array) into smaller pieces;
Apply a function to each piece; then
Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by dplyr.

Repositories

Other resources

The Split-Apply-Combine Strategy for Data Analysis by Hadley Wickham in the Journal of Statistical Software
Data visualisation in R with ggplot2 and plyr course
Tutorial from useR2009 conference
manipulatr Google Group
Posts on R-bloggers

Related tags

r's dplyr and data.table packages

2465 questions

votes

6 answers

using predict with a list of lm() objects

I have data which I regularly run regressions on. Each "chunk" of data gets fit a different regression. Each state, for example, might have a different function that explains the dependent value. This seems like a typical "split-apply-combine" type…

r plyr lm predict

asked Dec 13 '11 at 22:31

JD Long

59,675
58
202
294

votes

4 answers

Faster ways to calculate frequencies and cast from long to wide

I am trying to obtain counts of each combination of levels of two variables, "week" and "id". I'd like the result to have "id" as rows, and "week" as columns, and the counts as the values. Example of what I've tried so far (tried a bunch of other…

r aggregate plyr reshape2

asked Nov 18 '11 at 17:07

user592419

5,103
9
42
67

votes

3 answers

zipping lists in R

As a guideline I prefer apply functions on elements of a list using lapply or *ply (from plyr) rather than explicitly iterating through them. However, this works well when I have to process one list at a time. When the function takes multiple…

r plyr lapply

asked May 26 '11 at 22:18

gappy

10,095
14
54
73

votes

3 answers

Understanding ddply error message - argument "by" is missing, with no default

I am trying to figure out why I am getting an error message when using ddply. Example data: data<-data.frame(area=rep(c("VA","OC","ES"),each=4), sex=rep(c("Male","Female"),each=2,times=3), year=rep(c(2009,2010),times=6), …

r plyr

asked Nov 19 '15 at 15:12

user41509

votes

2 answers

Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?

Note: The title of this question has been edited to make it the canonical question for issues when plyr functions mask their dplyr counterparts. The rest of the question remains unchanged. Suppose I have the following data: dfx <- data.frame( …

r dplyr plyr r-faq

asked Sep 29 '14 at 18:09

Ignacio

7,646
16
60
113

votes

2 answers

Create an "index" for each element of a group with data.table

My data is grouped by the IDs in V6 and ordered by position (V1:V3): dt V1 V2 V3 V4 V5 V6 1: chr1 3054233 3054733 . + ENSMUSG00000090025 2: chr1 3102016 3102125 . + ENSMUSG00000064842 3: chr1 3205901 3207317 .…

r indexing data.table bioinformatics plyr

asked Feb 09 '14 at 11:22

fridaymeetssunday

1,118
1
21
31

votes

2 answers

What is purpose of dot before variables (i.e. "variables") in the R Plyr package?

What is purpose of dot before variables (i.e. "variables") in the R Plyr package? for instance, from the R help file: ddply(.data, .variables, .fun = NULL, ..., .progress = "none", .drop = TRUE, .parallel = FALSE) Any assistance would be…

r plyr

asked Jan 30 '13 at 16:26

MikeTP

7,716
16
44
57

votes

6 answers

How to fill NA with median?

Example data: set.seed(1) df <- data.frame(years=sort(rep(2005:2010, 12)), months=1:12, value=c(rnorm(60),NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) head(df) years months value 1 2005 1 -0.6264538 2 2005…

r plyr data.table statistics

asked Aug 15 '12 at 15:05

Sheridan

votes

2 answers

Checking if an r package is currently attached

I am having trouble with my workflow because I am sourcing multiple scripts in rmarkdown, some of which require the package dplyr and some of which use plyr. The problem is that the rename function exists in both packages and if dplyr is currently…

r dplyr plyr

asked Jun 06 '16 at 23:57

llewmills

2,959
3
31
58

votes

5 answers

ggplot2 fails to install on R 3.0.2

I am unable to install ggplot2 in R 3.0.2 on Ubuntu. When I run install.packages('ggplot2',dependencies = TRUE) I get the following error. > install.packages('ggplot2',dependencies = TRUE) Installing package into…

r ggplot2 plyr

asked Jun 04 '15 at 06:39

gnjago

3,391
5
19
17

votes

3 answers

scale/normalize columns by group

I have a data frame that looks like this: Store Temperature Unemployment Sum_Sales 1 1 42.31 8.106 1643691 2 1 38.51 8.106 1641957 3 1 39.93 8.106 1611968 4 1 46.63 8.106 …

r dplyr scale plyr

asked Nov 15 '14 at 19:55

itjcms18

3,993
7
26
45

votes

4 answers

ddply multiple quantiles by group

how can I do this calculation: library(ddply) quantile(baseball$ab) 0% 25% 50% 75% 100% 0 25 131 435 705 by groups, say by "team"? I want a data.frame with rownames "team" and column names "0% 25% 50% 75% 100%", i.e. one quantile…

r plyr

asked Mar 14 '14 at 11:14

Florian Oswald

5,054
5
30
38

votes

4 answers

Compute rolling sum by id variables, with missing timepoints

I'm trying to learn R and there are a few things I've done for 10+ years in SAS that I cannot quite figure out the best way to do in R. Take this data: id class t count desired -- ----- ---------- ----- ------- 1 A …

r sas plyr zoo

asked May 30 '13 at 15:26

ADJ

4,892
10
50
83

votes

2 answers

Merge Rows within Data Frame

I have a relational dataset, where I'm looking for dyadic information. I have 4 columns. Sender, Receiver, Attribute, Edge I'm looking to take the repeated Sender -- Receiver counts and convert them as additional edges. df <- data.frame(sender =…

r plyr data.table

asked May 24 '12 at 02:31

crock1255

1,025
2
12
23

votes

2 answers

How does one aggregate and summarize data quickly?

I have a dataset whose headers look like so: PID Time Site Rep Count I want sum the Count by Rep for each PID x Time x Site combo on the resulting data.frame, I want to get the mean value of Count for PID x Time x Site combo. Current function is as…

r plyr data.table

asked Oct 11 '11 at 07:09

Maiasaura

32,226
27
104
108

Prev 1 2 3

…

99 100 Next