Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
56
votes
2 answers

Reverse stacked bar order

I'm creating a stacked bar chart using ggplot like this: plot_df <- df[!is.na(df$levels), ] ggplot(plot_df, aes(group)) + geom_bar(aes(fill = levels), position = "fill") Which gives me something like this: How do I reverse the order the stacked…
Simon
  • 9,762
  • 15
  • 62
  • 119
56
votes
2 answers

Override column types when importing data using readr::read_csv() when there are many columns

I am trying to read a csv file using readr::read_csv in R. The csv file that I am importing has about 150 columns, I am just including the first few columns for the example. I am looking to override the second column from the default type (which is…
rajvijay
  • 1,641
  • 4
  • 23
  • 28
54
votes
5 answers

Correct syntax for mutate_if

I would like to replace NA values with zeros via mutate_if in dplyr. The syntax below: set.seed(1) mtcars[sample(1:dim(mtcars)[1], 5), sample(1:dim(mtcars)[2], 5)] <- NA require(dplyr) mtcars %>% mutate_if(is.na,0) mtcars %>% …
Konrad
  • 17,740
  • 16
  • 106
  • 167
54
votes
7 answers

Replace missing values (NA) with most recent non-NA by group

I would like to solve the following problem with dplyr. Preferable with one of the window-functions. I have a data frame with houses and buying prices. The following is an example: houseID year price 1 1995 NA 1 …
Peter Stephensen
  • 907
  • 2
  • 11
  • 15
52
votes
10 answers

Remove rows where all variables are NA using dplyr

I'm having some issues with a seemingly simple task: to remove all rows where all variables are NA using dplyr. I know it can be done using base R (Remove rows in R matrix where all data is NA and Removing empty rows of a data file in R), but I'm…
hejseb
  • 2,064
  • 3
  • 18
  • 28
52
votes
8 answers

Filter data frame by character column name (in dplyr)

I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a…
William Denton
  • 737
  • 1
  • 5
  • 11
52
votes
4 answers

dplyr: How to use group_by inside a function?

I want to use use the dplyr::group_by function inside another function, but I do not know how to pass the arguments to this function. Can someone provide a working example? library(dplyr) data(iris) iris %.% group_by(Species) %.% summarise(n = n())…
51
votes
9 answers

Adding column if it does not exist

I have a bunch of data frames with different variables. I want to read them into R and add columns to those that are short of a few variables so that they all have a common set of standard variables, even if some are unobserved. In other words... Is…
guyabel
  • 8,014
  • 6
  • 57
  • 86
51
votes
3 answers

R, dplyr - combination of group_by() and arrange() does not produce expected result?

when using dplyr function group_by() and immediately afterwards arrange(), I would expect to get an output where data frame is ordered within groups that I stated in group_by(). My reading of documentation is that this combination should produce…
Hrvoje
  • 513
  • 1
  • 4
  • 6
50
votes
9 answers

dplyr mutate rowwise max of range of columns

I can use the following to return the maximum of 2 columns newiris<-iris %>% rowwise() %>% mutate(mak=max(Sepal.Width,Petal.Length)) What I want to do is find that maximum across a range of columns so I don't have to name each one like…
user2502836
  • 703
  • 2
  • 6
  • 6
49
votes
5 answers

Replace NA with Zero in dplyr without using list()

In dplyr I can replace NA with 0 using the following code. The issue is this inserts a list into my data frame which screws up further analysis down the line. I don't even understand lists or atomic vectors or any of that at this point. I just want…
stackinator
  • 5,429
  • 8
  • 43
  • 84
49
votes
3 answers

How do I select columns that may or may not exist?

I have a data frame that may or may not have some particular columns present. I want to select columns using dplyr if they do exist and, if not, just ignore that I tried to select them. Here's an example: # Load libraries library(dplyr) # Create…
Dan
  • 11,370
  • 4
  • 43
  • 68
49
votes
4 answers

How to dplyr rename a column, by column index?

The following code renames first column in the data set: require(dplyr) mtcars %>% setNames(c("RenamedColumn", names(.)[2:length(names(.))])) Desired results: RenamedColumn cyl disp hp drat wt qsec vs am gear…
Konrad
  • 17,740
  • 16
  • 106
  • 167
49
votes
10 answers

Add margin row totals in dplyr chain

I would like to add overall summary rows while also calculating summaries by group using dplyr. I have found various questions asking how to do this, e.g. here, here, and here, but no clear solution. One possible approach is to perform count twice…
Jonny
  • 2,703
  • 2
  • 27
  • 35
48
votes
4 answers

Chain arithmetic operators in dplyr with %>% pipe

I would like to understand why, in the the dplyr or magrittr package, and more specifically the chaining function %>% has some trouble with the basic operators +, -, *, and / Chaining takes the output of previous statement and feeds it as first…
agenis
  • 8,069
  • 5
  • 53
  • 102