Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The r dplyr package is the next iteration of the plyr package. It has three main goals:

Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.

Provide fast performance for in-memory data by writing key pieces in C++.

Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Tibbles (from tibble package)
Databases (from dbplyr package)
Introduction to dplyr
Adding a new SQL backend (from dbplyr package)
Programming with dplyr
Two-table verbs
Window functions and grouped mutate/filter

Other resources

Related tags

R's plyr, magrittr, tidyr, tidyverse and data.table packages
Python's pandas library

36044 questions

votes

2 answers

Reverse stacked bar order

I'm creating a stacked bar chart using ggplot like this: plot_df <- df[!is.na(df$levels), ] ggplot(plot_df, aes(group)) + geom_bar(aes(fill = levels), position = "fill") Which gives me something like this: How do I reverse the order the stacked…

r ggplot2 dplyr bar-chart stacked-chart

asked Mar 10 '17 at 04:08

Simon

9,762
15
62
119

votes

2 answers

Override column types when importing data using readr::read_csv() when there are many columns

I am trying to read a csv file using readr::read_csv in R. The csv file that I am importing has about 150 columns, I am just including the first few columns for the example. I am looking to override the second column from the default type (which is…

r csv file-io dataframe dplyr

asked Jul 22 '15 at 16:06

rajvijay

1,641
4
23
28

votes

5 answers

Correct syntax for mutate_if

I would like to replace NA values with zeros via mutate_if in dplyr. The syntax below: set.seed(1) mtcars[sample(1:dim(mtcars)[1], 5), sample(1:dim(mtcars)[2], 5)] <- NA require(dplyr) mtcars %>% mutate_if(is.na,0) mtcars %>% …

r dplyr na

asked Feb 05 '17 at 12:31

Konrad

17,740
16
106
167

votes

7 answers

Replace missing values (NA) with most recent non-NA by group

I would like to solve the following problem with dplyr. Preferable with one of the window-functions. I have a data frame with houses and buying prices. The following is an example: houseID year price 1 1995 NA 1 …

r dataframe dplyr na

asked Apr 28 '14 at 11:42

Peter Stephensen

votes

10 answers

Remove rows where all variables are NA using dplyr

I'm having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Remove rows in R matrix where all data is NA and Removing empty rows of a data file in R), but I'm…

r dplyr

asked Jan 12 '17 at 09:51

hejseb

2,064
3
18
28

votes

8 answers

Filter data frame by character column name (in dplyr)

I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a…

r dplyr

asked Nov 29 '14 at 00:32

William Denton

votes

4 answers

dplyr: How to use group_by inside a function?

I want to use use the dplyr::group_by function inside another function, but I do not know how to pass the arguments to this function. Can someone provide a working example? library(dplyr) data(iris) iris %.% group_by(Species) %.% summarise(n = n())…

r dplyr tidyeval nse

asked Feb 16 '14 at 17:52

Emilio Torres Manzanera

5,202
2
15
8

votes

9 answers

Adding column if it does not exist

I have a bunch of data frames with different variables. I want to read them into R and add columns to those that are short of a few variables so that they all have a common set of standard variables, even if some are unobserved. In other words... Is…

r dataframe dplyr purrr

asked Aug 24 '17 at 09:24

guyabel

8,014
6
57
86

votes

3 answers

R, dplyr - combination of group_by() and arrange() does not produce expected result?

when using dplyr function group_by() and immediately afterwards arrange(), I would expect to get an output where data frame is ordered within groups that I stated in group_by(). My reading of documentation is that this combination should produce…

r dplyr

asked Jul 09 '14 at 09:16

Hrvoje

votes

9 answers

dplyr mutate rowwise max of range of columns

I can use the following to return the maximum of 2 columns newiris<-iris %>% rowwise() %>% mutate(mak=max(Sepal.Width,Petal.Length)) What I want to do is find that maximum across a range of columns so I don't have to name each one like…

r dplyr

asked Oct 06 '15 at 19:49

user2502836

votes

5 answers

Replace NA with Zero in dplyr without using list()

In dplyr I can replace NA with 0 using the following code. The issue is this inserts a list into my data frame which screws up further analysis down the line. I don't even understand lists or atomic vectors or any of that at this point. I just want…

r dplyr na

asked Apr 20 '18 at 18:17

stackinator

5,429
8
43
84

votes

3 answers

How do I select columns that may or may not exist?

I have a data frame that may or may not have some particular columns present. I want to select columns using dplyr if they do exist and, if not, just ignore that I tried to select them. Here's an example: # Load libraries library(dplyr) # Create…

r select dplyr

asked May 04 '17 at 15:18

Dan

11,370
4
43
68

votes

4 answers

How to dplyr rename a column, by column index?

The following code renames first column in the data set: require(dplyr) mtcars %>% setNames(c("RenamedColumn", names(.)[2:length(names(.))])) Desired results: RenamedColumn cyl disp hp drat wt qsec vs am gear…

r dataframe dplyr rename nse

asked Mar 13 '17 at 17:20

Konrad

17,740
16
106
167

votes

10 answers

Add margin row totals in dplyr chain

I would like to add overall summary rows while also calculating summaries by group using dplyr. I have found various questions asking how to do this, e.g. here, here, and here, but no clear solution. One possible approach is to perform count twice…

r dplyr

asked Sep 15 '16 at 09:02

Jonny

2,703
2
27
35

votes

4 answers

Chain arithmetic operators in dplyr with %>% pipe

I would like to understand why, in the the dplyr or magrittr package, and more specifically the chaining function %>% has some trouble with the basic operators +, -, *, and / Chaining takes the output of previous statement and feeds it as first…

r dplyr piping magrittr

asked Dec 08 '14 at 18:23

agenis

8,069
5
53
102

Prev 1 2 3

…

99 100 Next