Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
1
vote
2 answers

How can I speed up this sapply for cross checking samples?

I'm trying to speed up a QC function for checking similarity between samples. I wanted to know if there is a faster way to compare the way I am doing below? I know there have been answers to this kind of question that are pretty definitive (on SO…
cylondude
  • 1,816
  • 1
  • 22
  • 55
1
vote
1 answer

Merging files (and file names) in R

I'm trying to merge a directory full of comma delimited text files using R, while also incorporating the file name of each file as a new variable in the data set. I've been using the following: library(plyr) file_list <- list.files() dataset <-…
Vadaar
  • 43
  • 2
  • 4
1
vote
1 answer

daply: Correct results, but confusing structure

I have a data.frame mydf, that contains data from 27 subjects. There are two predictors, congruent (2 levels) and offset (5 levels), so overall there are 10 conditions. Each of the 27 subjects was tested 20 times under each condition, resulting in a…
vincentqu
  • 357
  • 1
  • 2
  • 6
1
vote
2 answers

ddply multiple function arguments + naming

Browsing other questions I have almost solved my problem but failing at the last hurdle... using R I have a dataframe (d) of which I pass through a function (fd) with ddply from the plyr package, this returns a dataframe as expected. In my actual…
Salmo salar
  • 517
  • 1
  • 5
  • 17
1
vote
1 answer

Colwise eats column names within ddply

I'm trying to chunk through a data frame, find instances where the sub-data frames are unbalanced, and add 0 values for certain levels of a factor that are missing. To do this, within ddply, I did a quick comparison to a set vector of what levels…
jebyrnes
  • 9,082
  • 5
  • 30
  • 33
1
vote
1 answer

Sequentially numbering repetitive interactions in R

I have a data frame in R that has been previously sorted with data that looks like the following: id creatorid responderid 1 1 2 2 1 2 3 1 3 4 1 3 5 1 3 …
Pridkett
  • 4,883
  • 4
  • 30
  • 47
1
vote
1 answer

finding the last reading from a data.frame for ggplot2 using R

I'm trying to plot the price of vehicles over time. I'd like to include the reg. no of the vehicle as a marker for a sparkline. My data looks like this: > head (x[c(1,2,3,4)]) samp.date idx price reg.date 1 2012-11-15 xxxxxxb 27490 …
user676952
  • 33
  • 2
1
vote
2 answers

Read multiple files and save data into one dataframe in R

I am trying to read multiple files and then combine them into one data frame. The code that I am using is as follows: library(plyr) mydata = ldply(list.files(path="Data load for stations/data/Predicted",pattern = "txt"), function(filename) { dum =…
Jd Baba
  • 5,948
  • 18
  • 62
  • 96
1
vote
1 answer

Use plyr to summarize a data.frame and get counts of each unique item

I have a data.frame with task assignments from a ticket tracking system. Assignments <- data.frame('Task'=c(1, 1, 2, 3, 2, 2, 1), 'Assignee'=c('Alice', 'Bob', 'Alice', 'Alice', 'Bob', 'Chuck', 'Alice')) I need to summarize the data for some monthly…
Keith Twombley
  • 1,666
  • 1
  • 17
  • 21
1
vote
3 answers

Finding proportions based on data.frame subsets

I have a set of counts from data with three dimensions: df <- data.frame(type = c("A", "B", "B", "A", "A", "C", "B", "C"), group = c("Tp", "Tp", "Tp", "Tp", "Fc", "Fc", "Fc", "Fc"), size = c(10,20,30,40,10,20,30,40), count = c(1, 4, 2, 3, 2, 10, 2,…
MattLBeck
  • 5,701
  • 7
  • 40
  • 56
1
vote
1 answer

Converting ddply syntax into data.table

I have a 1.3 million row data frame which I need to aggregate into regional and temporal summaries. Plyr's syntax is straightforward, but it's just much too slow to be practical (I've left ddply to run for an hour, and it's completed less than 25%).…
tomw
  • 3,114
  • 4
  • 29
  • 51
1
vote
1 answer

Must ddply use all possible combinations of the splitting variable(s), or only observed?

I have a data frame called thetas containing about 2.7 million observations. > str(thetas) 'data.frame': 2700000 obs. of 8 variables: $ rho_cnd : num 0 0 0 0 0 0 0 0 0 0 ... $ pct_cnd : num 0 0 0 0 0 0 0 0 0 0 ... $ sx : num 1 2…
Jon
  • 753
  • 8
  • 18
1
vote
2 answers

R how to transform part of list into a data.frame?

Suppose I have a dataset as list object. Here is a way to quickly generate some random data: a <- list(x1=rnorm(10),x2=rnorm(10)) b <- list(y1=rnorm(10),y2=rnorm(10),y3=rnorm(10)) c <- list(x1=rnorm(10),x2=rnorm(10)) d <-…
Boxuan
  • 4,937
  • 6
  • 37
  • 73
1
vote
1 answer

Different results when when using ddply and summarize. Due to different R and plyr versions?

I'm looking to summarize data similar to the ToothGrowth data in the datasets package. The output I want looks like this: supp len half one two 1 OJ 619.9 132.3 227.0 260.6 2 VC 508.9 79.8 167.7 261.4 That is the sum of lengths split…
BuckyOH
  • 327
  • 2
  • 8
  • 17
1
vote
1 answer

Change data.frame in *_ply function

Let's say I have a <- data.frame( z = rep( c("A", "B", "C"), 2 ), p = 1:6, stringsAsFactors=FALSE ) b <- data.frame( z = c( rep( "A", 5), rep( "B", 5 ) ), q = 1:10, stringsAsFactors=FALSE ) and want to manipulate a while iterating over b using…
Beasterfield
  • 7,023
  • 2
  • 38
  • 47