Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
13
votes
6 answers

for each group summarise means for all variables in dataframe (ddply? split?)

A week ago I would have done this manually: subset dataframe by group to new dataframes. For each dataframe compute means for each variables, then rbind. very clunky ... Now i have learned about split and plyr, and I guess there must be an easier…
Andreas
  • 6,612
  • 14
  • 59
  • 69
13
votes
1 answer

How do I time out a lapply when a list item fails or takes too long?

For several efforts I'm involved in at the moment, I am running large datasets with numerous parameter combinations through a series of functions. The functions have a wrapper (so I can mclapply) for ease of operation on a cluster. However, I run…
Maiasaura
  • 32,226
  • 27
  • 104
  • 108
12
votes
3 answers

l_ply: how to pass the list's name attribute into the function?

Say I have an R list like this: > summary(data.list) Length Class Mode aug9104AP 18 data.frame list Aug17-10_acon_7pt_dil_series_01 18 data.frame…
dnagirl
  • 20,196
  • 13
  • 80
  • 123
12
votes
4 answers

Replace missing values (NA) in one data set with values from another where columns match

I have a data frame (datadf) with 3 columns, 'x', 'y, and z. Several 'x' values are missing (NA). 'y' and 'z' are non measured variables. x y z 153 a 1 163 b 1 NA d 1 123 a 2 145 e 2 NA c 2 NA b 1 199 a 2 I have another data frame…
JustOneGeek
  • 374
  • 2
  • 3
  • 12
12
votes
1 answer

How to strsplit different number of strings in certain column by do function

I have a problem with split column value when element of column has different number of strings. I can do it in plyr e.g.: library(plyr) column <- c("jake", "jane jane","john john john") df <- data.frame(1:3, name = column) df$name <-…
Nicolabo
  • 1,337
  • 12
  • 30
12
votes
4 answers

Returning first row of group

I have a dataframe consisting of an ID, that is the same for each element in a group, two datetimes and the time interval between these two. One of the datetime objects is my relevant time marker. Now I like to get a subset of the dataframe that…
fr3d-5
  • 792
  • 1
  • 6
  • 27
12
votes
6 answers

Count occurrences of factor in R, with zero counts reported

I want to count the number of occurrences of a factor in a data frame. For example, to count the number of events of a given type in the code below: library(plyr) events <- data.frame(type = c('A', 'A', 'B'), quantity = c(1,…
I Like to Code
  • 7,101
  • 13
  • 38
  • 48
12
votes
4 answers

R Dynamically build "list" in data.table (or ddply)

My aggregation needs vary among columns / data.frames. I would like to pass the "list" argument to the data.table dynamically. As a minimal example: require(data.table) type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3)) a <- (rep(1:3, 3)) b <-…
jjap
  • 189
  • 2
  • 9
12
votes
2 answers

Generate graphs in R for certain correlations in a matrix

I want to generate graphs between variables (columns) that have a correlation above and below a certain point as well as having a pvalue < 0.01. The graphs would be ggplot2 (line or bar) graphs plotting the two columns (variables) that…
themartinmcfly
  • 2,004
  • 2
  • 13
  • 12
12
votes
5 answers

Efficient multiplication of columns in a data frame

I have a large data frame in which I am multiplying two columns together to get another column. At first I was running a for-loop, like so: for(i in 1:nrow(df)){ df$new_column[i] <- df$column1[i] * df$column2[i] } but this takes like 9…
Doug
  • 597
  • 2
  • 7
  • 22
12
votes
4 answers

How to get top n companies from a data frame in decreasing order

I am trying to get the top 'n' companies from a data frame.Here is my code below. data("Forbes2000", package = "HSAUR") sort(Forbes2000$profits,decreasing=TRUE) Now I would like to get the top 50 observations from this sorted vector.
Teja
  • 13,214
  • 36
  • 93
  • 155
12
votes
5 answers

R: Generic flattening of JSON to data.frame

This question is about a generic mechanism for converting any collection of non-cyclical homogeneous or heterogeneous data structures into a dataframe. This can be particularly useful when dealing with the ingestion of many JSON documents or with a…
Sim
  • 13,147
  • 9
  • 66
  • 95
11
votes
1 answer

How to use ddply to add a column to a data frame?

I have a data frame that looks like this: site date var dil 1 A 7.4 2 2 A 6.5 2 1 A 7.3 3 2 A 7.3 3 1 B 7.1 1 2 B 7.7 2 1 B 7.7 3 2 B 7.4 3 I need add a…
matteo
  • 645
  • 3
  • 10
  • 18
11
votes
1 answer

Using Dates with the data.table package

I recently discovered the data.table package and was now wondering whether or not I should replace some of my plyr-code. To summarize, I really like plyr and I basically achieved everything I wanted. However, my code runs a while and the outlook of…
Christoph_J
  • 6,804
  • 8
  • 44
  • 58
11
votes
3 answers

Apply a list of n functions to each row of a dataframe?

I have a list of functions funs <- list(fn1 = function(x) x^2, fn2 = function(x) x^3, fn3 = function(x) sin(x), fn4 = function(x) x+1) #in reality these are all f = splinefun() And I have a…
Abe
  • 12,956
  • 12
  • 51
  • 72