Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
3
votes
3 answers

Divide group sum by total sum

I am using the dplyr package. Let's suppose I have the below table. Group count A 20 A 10 B 30 B 35 C 50 C 60 My goal is to create a summary table that contains the mean per each group, and also, the percentage of the mean of…
GitZine
  • 445
  • 3
  • 14
3
votes
1 answer

R: Combine rows with same ID

Edit: I changed Var4 to a string value as my question was not precise enough about my data and therefore answers were failing because of invalid types. Sorry for that this is my first question here and I hope someone can help me. I have the…
Aisberg
  • 35
  • 1
  • 5
3
votes
2 answers

Power BI DAX How to add column to a calculated table that summarizes another

I Have a TestTable that summarizes a table Receipts on the Month column and adds a column that counts the number of times (occurence) that each month appears in the Receipts Table. TestTable = SUMMARIZE(Receipts, Receipts[Month],…
Sweepster
  • 1,829
  • 4
  • 27
  • 66
3
votes
1 answer

Kusto - Join two tables and count keys from first table and second table on every record from first table

Need to Join two tables and count key from first table and second table on every record from first table let T = datatable(TId:int, TName:string, Tkey:string) [ 1, "A", "xyz", 2, "B", "xyz", 3, "C", "yza", ]; let u = datatable(UId:int,…
Sahil Raj
  • 107
  • 9
3
votes
1 answer

how to apply a function(x,y) with two variables across set of variables ending with .x and .y using dplyr

Sample data: sampdat <- data.frame(grp=rep(c("a","b","c"),c(2,3,5)), x1=seq(0,.9,0.1),x2=seq(.3,.75,0.05), y1=c(1:10), y2=c(11:20)) I would like to have the following data, but i have 100+ variables for which i'd like to apply a function with two…
Sam
  • 33
  • 3
3
votes
3 answers

How to combine multiple summarize calls dplyr?

Given the df ww <- data.frame( GM = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"), stanza = rep(c("Past", "Mid", "End"), 6), change = c(1, 1.1, 1.4, 1, 1.3, 1.5, 1,…
Jacob
  • 173
  • 8
3
votes
2 answers

R group columns of return trips data

I have data of train trips and the number of delayed or cancelled trains that I would like to make the sum. Start End Delayed Cancelled Paris Rome 1 0 Brussels Berlin 4 6 Berlin Brussels 6 2 Rome …
hug
  • 247
  • 4
  • 14
3
votes
2 answers

How to replace na in a column with the first non-missing value without dropping cases that only have missing values using R?

I have a long data frame that has many NAs, but I want to condenses it so all NAs are filled with the first non-missing value when grouped by a variable--but if the observation only has NAs, it keeps it. Until I updated R, I had a code that worked…
J.Sabree
  • 2,280
  • 19
  • 48
3
votes
1 answer

summarise_all with additional parameter that is a vector

Say I have a data frame: df <- data.frame(a = 1:10, b = 1:10, c = 1:10) I'd like to apply several summary functions to each column, so I use dplyr::summarise_all library(dplyr) df %>% summarise_all(.funs =…
Dan
  • 11,370
  • 4
  • 43
  • 68
3
votes
4 answers

Sum rows with value larger than n into one in R

I have a data frame: df <- data.frame(count=c(0,1,2,3,4,5,6), value=c(100,50,60,70,2,6,8)) count value 1 0 100 2 1 50 3 2 60 4 3 70 5 4 2 6 5 6 7 6 8 How do I sum value larger than "n" into one…
Algorithman
  • 1,309
  • 1
  • 16
  • 39
3
votes
2 answers

"group_by->summarise->mean()" taking way longer than expected

I have a dataset of around 4.2 million observations. My code is below: new_dataframe = original_dataframe %>% group_by(user_id, date) %>% summarise(delay = mean(delay, na.rm=TRUE) ) This pipeline should be taking a 4.2 million x 3…
tvbc
  • 33
  • 3
3
votes
2 answers

Perform group by on a column to calculate count of occurrences of another column in R

I have a dataset similar to sample dataset provided below: | Name | Response_days | state | |------|---------------|-------| | John | 0 | NY | | John | 6 | NY | | John | 9 | NY | | Mike | 3 |…
hk2
  • 487
  • 3
  • 15
3
votes
4 answers

Summary statistics for multiple variables with statistics as rows and variables as columns?

I'm trying to use dplyr::summarize() and dplyr::across() to obtain a tibble with several summary statistics in the rows and the variables in the columns. I was only able to achieve this result by using dplyr::bind_rows(), but I'm wondering if…
Lucas De Abreu Maia
  • 598
  • 2
  • 6
  • 19
3
votes
3 answers

Summarise multiple columns that have to be grouped tidyverse

I have a data frame containing data that looks something like this: df <- data.frame( group1 = c("High","High","High","Low","Low","Low"), group2 = c("male","female","male","female","male","female"), one =…
Jeff238
  • 396
  • 2
  • 15
3
votes
3 answers

How to aggregate R dataframe of two columns based on values of another

My dataframe is as follows in which gender=="1" refers to men and gender=="2" refers to women, Occupations go from A to U and year goes from 2010 to 2018 (I give you a small example) Gender Occupation Year 1 A 2010 1 …
Ana
  • 65
  • 4