Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
1
vote
2 answers

R Create One Hot Vector From List Elements

I am trying to process some character strings for an input file. First I convert the strings from a vector to a list, then I reduce to only unique values. Next I would like to convert the words in each list element into a string with a separator of…
screechOwl
  • 27,310
  • 61
  • 158
  • 267
1
vote
1 answer

Optimising by Group of own function in r

I would like to apply an optimization by group on my own function: Here a reproducable data set: data <- data.frame(ID=c(1,1,1,2,2,3,3),C=c(1,1,1,2,2,3,4), Lambda=c(0.5),s=c(1:7), …
New2R
  • 93
  • 6
1
vote
2 answers

Passing a character vector as arguments to a function in plyr

I suspect I'm Doing It Wrong, but I'd like to pass a character vector as an argument to a function in ddply. There's a lot of Q&A on removing quotes, etc. but none of it seems to work for me (eg. Remove quotes from a character vector in R and…
Ben
  • 41,615
  • 18
  • 132
  • 227
1
vote
2 answers

Ddply and summary of categorical variables

I have a dataframe x like this Id Group Var1 001 A yes 002 A no 003 A yes 004 B no 005 B yes 006 C no I want to create a data frame like this Group yes no A 2 1 B 1 1 C …
corrado
  • 135
  • 2
  • 10
1
vote
2 answers

How to calculate percentage change from different rows over different spans

I am trying to calculate the percentage change in price for quarterly data of companies recognized by a gvkey(1001, 1384, etc...). and it's corresponding quarterly stock price, PRCCQ. gvkey PRCCQ 1 1004 23.750 2 1004 13.875 3 1004…
user2076502
  • 13
  • 1
  • 4
1
vote
2 answers

Sum duplicates then remove all but first occurrence

I have a data frame (~5000 rows, 6 columns) that contains some duplicate values for an id variable. I have another continuous variable x, whose values I would like to sum for each duplicate id. The observations are time dependent, there are year and…
Chris
  • 401
  • 1
  • 5
  • 10
1
vote
2 answers

Summary data tables from wide data.frames

I am trying to find lazy/easy ways of creating summary tables/data.frames from wide data.frames. Assume a following data.frame, but with many more columns so that specifying the column names takes a long time: set.seed(2) x <- data.frame(Rep =…
Mikko
  • 7,530
  • 8
  • 55
  • 92
1
vote
1 answer

Scaling / mean center / demean variable in sqldf / SQLite?

I am trying to mean center (aka demean, scale) a variable by 3 dimensions: year, month, and region using the sqldf package in R. Here is exactly what I want to do using the plyr package: ## create example data set.seed(145) v =…
baha-kev
  • 3,029
  • 9
  • 33
  • 31
1
vote
2 answers

Use ddply() to aggregate relative histogram counts

Related to a previous question I asked (ggplot2 how to get 2 histograms with the y value = to count of one / sum of the count of both), I tried to write a function which would take a data.frame as input with the response times (RT) and accuracy…
shora
  • 131
  • 11
1
vote
2 answers

Split input of apply function using a continuous classifier

I have the example data frame test.df<-data.frame(classifier=runif(n=1000), x1=rnorm(1000), x2=rnorm(1000), x3=rnorm(1000)) with x1,x2,...,x10000 I would like to use the apply function to perform a large amount of tests (lets say t.test) and…
ECII
  • 10,297
  • 18
  • 80
  • 121
1
vote
1 answer

Summarize dataframe by day from timestamp

I have a dataset data that contains a timestamp and a suite of other variables with values at each timestamp. I am trying to use ddply within plyr to create a new dataframe that is the summary (e.g. mean) of a variable by the group day. How can I…
nofunsally
  • 2,051
  • 6
  • 35
  • 53
1
vote
1 answer

d_ply and dist() together

I'm having trouble with a R code that I wrote. Particularly it looks like this: n<- nrow(aa) for (i in 1:n) { A<- aa[i,] d_ply(A, 1, function(row){ cu<- dist(A) write.table(cu, file = paste(row$header, "txt", sep = "."), sep = "\t") },…
Gabelins
  • 285
  • 1
  • 2
  • 12
1
vote
1 answer

summarise() - calculating percentages and counts of factor

I'm trying to use summarise() from the plyr-packge to calculate percentages of occurences of each level in a factor. EDIT: The Puromycin data is in the base R installation My data look like this: library(plyr) data.p <-…
Rene Bern
  • 545
  • 3
  • 10
  • 18
1
vote
2 answers

Use plyr to compute margins

I have a data frame with something like the following structure: Trial Index Condition1 Condition2 Measures 1 A Y ... 2 A Y ... 3 B Y…
Nathan
  • 340
  • 2
  • 11
1
vote
1 answer

count shared occurrences and remove duplicates

I have this data.frame : df <- read.table(text= " section to from time a 1 5 9 a 2 5 9 a 1 5 10 …
user1317221_G
  • 15,087
  • 3
  • 52
  • 78