Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
2
votes
2 answers

Find mean of counts within groups

I have a dataframe that looks like this: library(tidyverse) x <- tibble( batch = rep(c(1,2), each=10), exp_id = c(rep('a',3),rep('b',2),rep('c',5),rep('d',6),rep('e',4)) ) I can run the code below to get the count perexp_id: x %>%…
Adam_G
  • 7,337
  • 20
  • 86
  • 148
2
votes
1 answer

Summarize while joining in R

I have two datasets and I want to join the two datasets and apply a summarize command at the same time. Example data: Data 1: We observe three products (id) at three points in time (obs_id) and the number of reviews on this product…
Scijens
  • 541
  • 2
  • 11
2
votes
3 answers

Find relative frequencies of summarized columns in R

I need to get the relative frequencies of a summarized column in R. I've used dplyr's summarize to find the total of each grouped row, like this: data %>% group_by(x) %>% summarise(total = sum(dollars)) x total
2
votes
2 answers

dplyr: Why are cases not summarized using summarise()?

I have > head(df,7) date pos cons_week 1 2020-03-30 313 169 2 2020-03-31 255 169 3 2020-04-01 282 169 4 2020-04-02 382 169 5 2020-04-03 473 169 6 2020-04-04 312 169 7 2020-04-05 158 169 pos denotes…
cmirian
  • 2,572
  • 3
  • 19
  • 59
2
votes
3 answers

Is there a more efficient way to obtain variance of lot's of columns than dplyr?

I have a data.frame that is >250,000 columns and 200 rows, so around 50 million individual values. I am trying to get a breakdown of the variance of the columns in order to select the columns with the most variance. I am using dplyr as follows: df…
reubenmcg
  • 371
  • 4
  • 18
2
votes
2 answers

How to use group_by() and summarize() to count the occurances of datapoints?

p <- data.frame(x = c("A", "B", "C", "A", "B"), y = c("A", "B", "D", "A", "B"), z = c("B", "C", "B", "D", "E")) p d <- p %>% group_by(x) %>% summarize(occurance1 = count(x), occurance2 =…
2
votes
2 answers

Continual error with summarize function dplyr

I am trying to calculate the mean, median, min, max across all variables across the grouping Site using the summarize function. In my code, I replace NA with 0, but I am also open to utilizing na.rm=TRUE instead if it easy to incorporate. I keep…
Adam
  • 433
  • 2
  • 16
2
votes
1 answer

Is there a way to create a range of sum of multiple columns matching a condition in R?

I have a data frame which contains an experimental CONDITION which has an determined INDEX. Each experiment has a NAME-A associated and a NAME_B corresponding to a specific NAME_A. My main objective is to summarize total of NAME-A and NAME-B by…
2
votes
1 answer

dplyr groups not working with dollar sign data$column syntax

I'm looking to find the min and max values of a column for each group: mtcars %>% group_by(mtcars$cyl) %>% summarize( min_mpg = min(mtcars$mpg), max_mpg = max(mtcars$mpg) ) # # A tibble: 3 x 3 # `mtcars$cyl` min_mpg max_mpg # …
Jurgen
  • 51
  • 5
2
votes
3 answers

summarize_all rows by grouping and define which value should be kept

I have a data frame in which several data sources are merged. This creates rows with the same id. Now I want to define which values from which row should be kept. So far I have been using dplyr with group_by and summarize all to keep the first value…
Axel K
  • 191
  • 8
2
votes
1 answer

Counting the occurences of a string in dataframe row

I have a data frame (named as df) of 144 columns (trial numbers) containing the information about the trial success (Yes/No) per participant (the rows). A subset would look like this: V1 V2 V3 V4 V5 Yes No Yes Yes …
e. erhan
  • 61
  • 6
2
votes
3 answers

mutate or summarise across rows by variable containing string

I'd like to create a new data table which is the sum across rows from variables which contain a string. I have been trying to keep this within the tidyverse as a noob using new dplyr across. Help much appreciated. dat<- data.frame("Image" =…
JMonk
  • 71
  • 5
2
votes
2 answers

Summarising pandas dataframe in single row

I am hoping for some help in summarizing the dataframe detailed below into a one row summary as shown in desired output further down on the page. Many thanks in advance. employees = {'Name of Employee': ['Mark','Mark','Mark','Mark','Mark','Mark',…
windwalker
  • 359
  • 4
  • 14
2
votes
4 answers

Get the sum for pair of rows

I have the following dataframe imported in R: product per1 per2 per3 A 10 20 30 B 23 14 21 C 26 95 81 Consider A:C as products listed in rows one after another and their corresponding sales values across…
Kathir
  • 21
  • 2
2
votes
1 answer

Why does summarize function don't get an error?

We all know that we can compute any summary that operates on vectors and returns a single value in R base. I want to ask that why I don't get an error when I attempt to use quantile which returns more than one value function inside of…
sociolog
  • 83
  • 1
  • 8