Questions tagged [plyr]

plyr is an R package with tools to solve a variety of problems using the split-apply-combine strategy

plyr is an R package written by Hadley Wickham which contains tools to solve a variety of problems using the strategy of split, apply and combine:

  • Split a data structure (data frame, list, array) into smaller pieces;
  • Apply a function to each piece; then
  • Combine the results into a data structure.

It partially replaces the apply family of functions (lapply, tapply, Map, etc.) in base-R, and is partially succeeded by .

Repositories

Other resources

Related tags

2465 questions
23
votes
8 answers

quick/elegant way to construct mean/variance summary table

I can achieve this task, but I feel like there must be a "best" (slickest, most compact, clearest-code, fastest?) way of doing it and have not figured it out so far ... For a specified set of categorical factors I want to construct a table of means…
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
23
votes
1 answer

round_any equivalent for dplyr?

I am trying to make a switch to the "new" tidyverse ecosystem and try to avoid loading the old packages from Wickham et al. I used to rely my coding previously. I found round_any function from plyr useful in many cases where I needed custom rounding…
Mikko
  • 7,530
  • 8
  • 55
  • 92
23
votes
4 answers

dplyr: apply function table() to each column of a data.frame

Apply function table() to each column of a data.frame using dplyr I often apply the table-function on each column of a data frame using plyr, like this: library(plyr) ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) ) Is…
Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79
23
votes
3 answers

Learning to understand plyr, ddply

I've been attempting to understand what and how plyr works through trying different variables and functions and seeing what results. So I'm more looking for an explanation of how plyr works than specific fix it answers. I've read the documentation…
rsgmon
  • 1,892
  • 4
  • 23
  • 35
23
votes
2 answers

ddply + summarize for repeating same statistical function across large number of columns

Ok, second R question in quick succession. My data: Timestamp St_01 St_02 ... 1 2008-02-08 00:00:00 26.020 25.840 ... 2 2008-02-08 00:10:00 25.985 25.790 ... 3 2008-02-08 00:20:00 25.930 25.765 ... 4 2008-02-08 00:30:00 25.925…
Reuben L.
  • 2,806
  • 2
  • 29
  • 45
22
votes
4 answers

Simple working example of ddply() in parallel on Windows

I've been searching around for a simple working example of using ddply() in parallel. I've installed the "foreach" package, but when I call ddply( .parallel = TRUE) I get a warning that "No parallel backend registered") Can someone provide a simple…
Suraj
  • 35,905
  • 47
  • 139
  • 250
21
votes
3 answers

Using plyr::mapvalues with dplyr

plyr::mapvalues can be used like this: mapvalues(mtcars$cyl, c(4, 6, 8), c("a", "b", "c")) But this doesn't work: mtcars %>% dplyr::select(cyl) %>% mapvalues(c(4, 6, 8), c("a", "b", "c")) %>% as.data.frame() How can I use plyr::mapvalues with…
luciano
  • 13,158
  • 36
  • 90
  • 130
21
votes
7 answers

plyr or dplyr in Python

This is more of a conceptual question, I do not have a specific problem. I am learning python for data analysis, but I am very familiar with R - one of the great things about R is plyr (and of course ggplot2) and even better dplyr. Pandas of course…
user1617979
  • 2,370
  • 3
  • 25
  • 30
21
votes
2 answers

Convert R list to dataframe with missing/NULL elements

Given a list: alist = list( list(name="Foo",age=22), list(name="Bar"), list(name="Baz",age=NULL) ) what's the best way to convert this into a dataframe with name and age columns, with missing values (I'll accept NA or "" in that order of…
Spacedman
  • 92,590
  • 12
  • 140
  • 224
19
votes
2 answers

R: converting each row of a data frame into a list item

I have a number of operations on data frames which I would like to speed up using mclapply() or other lapply() like functions. One of the easiest ways for me to wrestle with this is to make each row of the data frame a small data frame in a list. I…
JD Long
  • 59,675
  • 58
  • 202
  • 294
19
votes
5 answers

Joining aggregated values back to the original data frame

One of the design patterns I use over and over is performing a "group by" or "split, apply, combine (SAC)" on a data frame and then joining the aggregated data back to the original data. This is useful, for example, when calculating each county's…
JD Long
  • 59,675
  • 58
  • 202
  • 294
19
votes
2 answers

dplyr rename - Error: `new_name` = old_name must be a symbol or a string, not formula

I am trying to rename a column with dplyr::rename() and R is returning this error that I am unable to find anywhere online. Error: `new_name` = old_name must be a symbol or a string, not formula Reproducible example with a 2 column data…
19
votes
3 answers

Idiomatic R code for partitioning a vector by an index and performing an operation on that partition

I'm trying to find the idiomatic way in R to partition a numerical vector by some index vector, find the sum of all numbers in that partition and then divide each individual entry by that partition sum. In other words, if I start with this: df <-…
John Horton
  • 4,122
  • 6
  • 31
  • 45
19
votes
3 answers

Sending in Column Name to ddply from Function

I'd like to be able to send in a column name to a call that I am making to ddply. An example ddply call: ddply(myData, .(MyGrouping), summarise, count=sum(myColumnName)) If I have ddply wrapped within another function is it possible to wrap this so…
Dave
  • 2,386
  • 1
  • 20
  • 38
18
votes
2 answers

Correlation between two dataframes by row

I have 2 data frames w/ 5 columns and 100 rows each. id price1 price2 price3 price4 price5 1 11.22 25.33 66.47 53.76 77.42 2 33.56 33.77 44.77 34.55 57.42 ... I…
screechOwl
  • 27,310
  • 61
  • 158
  • 267