Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
11
votes
4 answers

Combine a list of data frames into one preserving row names

I do know about the basics of combining a list of data frames into one as has been answered before. However, I am interested in smart ways to maintain row names. Suppose I have a list of data frames that are fairly equal and I keep them in a named…
Midnighter
  • 3,771
  • 2
  • 29
  • 43
11
votes
1 answer

Add an index (or counter) to a dataframe by group in R

I have a df like ProjectID Dist 1 x 1 y 2 z 2 x 2 h 3 k .... .... I want to add a third column such that we have an incrementing counter for each ProjectID: ProjectID Dist counter 1 …
sjgknight
  • 393
  • 1
  • 5
  • 19
11
votes
4 answers

Unique rows, considering two columns, in R, without order

Unlike questions I've found, I want to get the unique of two columns without order. I have a df: df<-cbind(c("a","b","c","b"),c("b","d","e","a")) > df [,1] [,2] [1,] "a" "b" [2,] "b" "d" [3,] "c" "e" [4,] "b" "a" In this case,…
eflores89
  • 339
  • 2
  • 10
  • 27
11
votes
2 answers

Accessing grouped data in dplyr

How can I access the grouped data after applying group_by function from dplyr and using %.% operator For example, If I want to have the first row of each grouped data then I can do this using plyr package as ddply(iris,.(Species),function(df){ …
Chitrasen
  • 1,706
  • 18
  • 15
11
votes
4 answers

R use ddply or aggregate

I have a data frame with 3 columns: custId, saleDate, DelivDateTime. > head(events22) custId saleDate DelivDate 1 280356593 2012-11-14 14:04:59 11/14/12 17:29 2 280367076 2012-11-14 17:04:44 11/14/12 20:48 3 280380097 2012-11-14…
screechOwl
  • 27,310
  • 61
  • 158
  • 267
11
votes
2 answers

renaming the output column with the plyr package in R

Hadley turned me on to the plyr package and I find myself using it all the time to do 'group by' sort of stuff. But I find myself having to always rename the resulting columns since they default to V1, V2, etc. Here's an…
JD Long
  • 59,675
  • 58
  • 202
  • 294
11
votes
5 answers

Block bootstrap from subject list

I'm trying to efficiently implement a block bootstrap technique to get the distribution of regression coefficients. The main outline is as follows. I have a panel data set, and say firm and year are the indices. For each iteration of the…
baha-kev
  • 3,029
  • 9
  • 33
  • 31
11
votes
3 answers

Summing rows based on specific factor combinations

This is probably a silly question, but I have read through Crawley's chapter on dataframes and scoured the internet and haven't yet been able to make anything work. Here is a sample dataset similar to mine: >…
user1371443
  • 113
  • 1
  • 1
  • 4
10
votes
3 answers

Problem loading the plyr package

I use R 2.13.1 and have unsuccessfully tried to load the package "plyr 1.6" in R. I have manually installed it into a directory "~/R/library". My code is: .libPaths("~/R/library") library(plyr) I get the message: Error in library(plyr) : 'plyr'…
10
votes
3 answers

Loops to create new variables in ddply

I am using ddply to aggregate and summarize data frame variables, and I am interested in looping through my data frame's list to create the new variables. new.data <- ddply(old.data, c("factor", "factor2"), …
Iris Tsui
  • 213
  • 2
  • 8
10
votes
9 answers

With min() in R return NA instead of Inf

Please consider the following: I recently 'discovered' the awesome plyr and dplyr packages and use those for analysing patient data that is available to me in a data frame. Such a data frame could look like this: df <- data.frame(id = c(1, 1, 1, 2,…
Frederick
  • 810
  • 8
  • 28
10
votes
1 answer

How to use string variables to create variables list for ddply?

Using R's builtin ToothGrowth example dataset, this works: ddply(ToothGrowth, .(supp,dose), function(df) mean(df$len)) But I would like to have the subsetting factors be variables, something like factor1 = 'supp' factor2 =…
Alex Holcombe
  • 2,453
  • 4
  • 24
  • 34
10
votes
1 answer

subset parameter in layers is no longer working with ggplot2 >= 2.0.0

I updated to the newest version of ggplot2 and run into problems by printing subsets in a layer. library(ggplot2) library(plyr) df <- data.frame(x=runif(100), y=runif(100)) ggplot(df, aes(x,y)) + geom_point(subset=.(x >= .5)) These lines of code…
drmariod
  • 11,106
  • 16
  • 64
  • 110
10
votes
5 answers

Fill NA values with the trailing row value times a growth rate?

What would be a good way to populate NA values with the previous value times (1 + growth)? df <- data.frame( year = 0:6, price1 = c(1.1, 2.1, 3.2, 4.8, NA, NA, NA), price2 = c(1.1, 2.1, 3.2, NA, NA, NA, NA) ) growth <- .02 In this case, I…
Adam Smith
  • 2,584
  • 2
  • 20
  • 34
10
votes
1 answer

List all variables (and their proportions) in a subset of a dataframe

For an example dataframe containing a collection of longitudinal and latitudinal coordinate pairs and the times an object was at them: bout <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,…
KT_1
  • 8,194
  • 15
  • 56
  • 68