Questions tagged [dplyr]

Use this tag for questions relating to functions from the dplyr package, such as group_by, summarize, filter, and select.

The dplyr package is the next iteration of the package. It has three main goals:

  1. Identify the most important data manipulation tools needed for data analysis and make them easy to use from R.
  2. Provide fast performance for in-memory data by writing key pieces in C++.
  3. Use the same interface to work with data no matter where it's stored, whether in a data.frame, a data.table or a database.

Repositories

Vignettes

Some vignettes have been moved to other related packages.

Other resources

Related tags

36044 questions
6
votes
3 answers

do.call() and tidy evaluation

Trying to make do.call() work in the context of tidy evaluation: library(rlang) library(dplyr) data <- tibble(item_name = c("apple", "bmw", "bmw")) mutate(data, category = case_when(item_name == "apple" ~ "fruit", …
Aurèle
  • 12,545
  • 1
  • 31
  • 49
6
votes
3 answers

Use perl=TRUE regex in dplyr select

How can I select cols using perl = TRUE like regex. data.frame(baa=0,boo=0,boa=0,lol=0,bAa=0) %>% dplyr::select(matches("(?i)b(?!a)")) Error in grep(needle, haystack, ...) : invalid regular expression '(?i)b(?!a)', reason 'Invalid…
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
6
votes
4 answers

r collapsing data from multiple columns into one

I know there are many questions on this topic so I apologize if this is a duplicate question. I'm trying to collapse multiple columns in a data set into one column: Assuming this is the structure of the dataset I am working with, df <- data.frame( …
Science11
  • 788
  • 1
  • 8
  • 24
6
votes
2 answers

dplyr invalid subscript type list

I have run into an error in a script I am writing that only occurs when I have dplyr running. I first encountered it when I found a function from dplyr that I wanted to use, after which I installed and ran the package. Here is an example of my…
Walker in the City
  • 527
  • 1
  • 9
  • 22
6
votes
2 answers

Wildcards for filter function in dplyr

I am using dplyr and I would like to filter my dataframe (biotypes) according to sample IDs which are the first column of the data frame, e.g. they look like this: ID chrX.tRNA494-SerAGA chrX.tRNA636-AlaCGC mmu_piR_000007 ... I want to filter IDs…
Anna
  • 61
  • 1
  • 2
6
votes
3 answers

R remove numbers in data frame entries containing only numbers

I am reading in a data frame from an online csv file, but the person who create the file has accidentally entered some numbers into column which should just be city names. Sample for cities.data table. City Population Foo Bar Seattle …
sushi
  • 274
  • 1
  • 4
  • 13
6
votes
1 answer

dplyr number of rows across groups after filtering

I want the count and proportion (of all of elements) of each group in a data frame (after filtering). This code produces the desired output: library(dplyr) df <- data_frame(id = sample(letters[1:3], 100, replace = TRUE), value =…
Fridolin Linder
  • 401
  • 6
  • 12
6
votes
3 answers

left_join (dplyr) the next available date

I have 2 datasets in "R". The first DB contains specific dates: Value Date # 20 2017-10-19 # 19 2017-10-23 # 19 2017-11-03 # 20 2017-11-10 And the second contains the level of an stock…
MelBourbon
  • 105
  • 1
  • 9
6
votes
2 answers

transform a dataframe of frequencies to a wider format

I have a dataframe that looks like this. input dataframe position,mean_freq,reference,alternative,sample_id 1,0.002,A,C,name1 2,0.04,G,T,name1 3,0.03,A,C,name2 These data are nucleotide differences at a given position in a hypothetical genome,…
eastafri
  • 2,186
  • 2
  • 23
  • 34
6
votes
2 answers

Creating and using new variables in function in R: NSE programing error in the tidyverse

After reading and re-reading the many "programing with dplyr" guides, I still cannot find a way to solve my particular case. I understand that the use of group_by_, mutate_ and such "string-friendly" versions of tidyverse functions is heading…
Dominique Makowski
  • 1,511
  • 1
  • 13
  • 30
6
votes
1 answer

Recode a string column into integer using dplyr

How to create a new integer column recode which recodes for an existing column y in the dataframe df using dplyr approaches? # Generates Random data df <- data.frame(x = sample(1:100, 50), y = sample(LETTERS, 50, replace = TRUE),…
Prradep
  • 5,506
  • 5
  • 43
  • 84
6
votes
2 answers

How do I "flush" data to my RSQLite disk database?

I'm creating a database using R package dbplyr, using RSQLite, but my database is zero-bytes in size on disk despite my writing (and reading back) a table. Here is my script: library("RSQLite") library("dbplyr") library("dplyr") data(mtcars) con…
Thomas Browne
  • 23,824
  • 32
  • 78
  • 121
6
votes
1 answer

Can dplyr::case_when return mix of NAs and non-NAs?

Can case_when() in dplyr return a mix of NA and non-NA values? When I ask it to return NA in response to one statement, but an non-NA value in response to another statement, it throws an evaluation error: E.g, I want 1 for all values of cyl >= 6,…
Scransom
  • 3,175
  • 3
  • 31
  • 51
6
votes
2 answers

In Python Pandas, how to use like R dplyr mutate_each

In Python Pandas, I want to add columns by executing multiple aggregate functions on multiple columns like R dplyr mutate_each. For example, Can Python Pandas realize the same processing as the following R script? R dplyr : iris %>% …
6
votes
3 answers

dplyr filter by the first column

Is it possible to filter in dplyr by the position of a column? I know how to do it without dplyr iris[iris[,1]>6,] But how can I do it in dplyr? Thanks!
lokheart
  • 23,743
  • 39
  • 98
  • 169