Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
3
votes
3 answers

r min max dates by id and multiple status changes within ID

I have an animal tracking dataset which is as shown below Id Start Stop Status 78122 10/12/1919 10/12/1919 Birth 78122 1/18/1966 2/2/1972 In 78122 2/3/1972 9/8/1972 In 78122 9/9/1972…
3
votes
1 answer

Accessing other group_by groups with summarize()

I have a data frame with columns genes, the region of the chromosome they belong to, the cell line the gene expression was measured from, and the gene's expression level in that cell line -- it looks basically something like this: gene region …
cheal
  • 41
  • 3
3
votes
2 answers

Summarize using different grouping variables in dplyr

I would like summarize a dataframe using different grouping variables for each summary I wish to be carried out. As an example I have three variables (x1, x2, x3). I want to group the dataframe by x1 and get the number of observations in that group,…
H. Kraus
  • 43
  • 4
3
votes
3 answers

How to keep other columns when using dplyr?

I have a similar problem as described How to aggregate some columns while keeping other columns in R?, but none of the solutions from there which I have tried work. I have a data frame like…
Zizou
  • 503
  • 5
  • 18
3
votes
1 answer

R sum observations by unique column PAIRS (B-A and A-B) and NOT unique combinations (B-A or A-B)

I have a seemingly simple calculation, where I have a data frame composed of 4 columns as shown below (Date, Origin, Destination, count). I would like to sum the count by Date, and unique pair of ID1 and ID2, meaning that A-B and B-A are ONE…
Roberto
  • 181
  • 8
3
votes
3 answers

dplyr summarise based on order condition with if statement

By group (group_by(id)), I am trying to sum a variable based on a selection of types. However, there is an order of preference of these types. Example: library(tidyverse) df <- data.frame(id = c(rep(1, 6), 2, 2, 2, rep(3, 4), 4, 5), …
user63230
  • 4,095
  • 21
  • 43
3
votes
1 answer

How to summarize large dataframes in python pandas (50 columns x 2m rows)

For a project i manipulate a few columns of the dataset and afterwards join these newly created columns back to the entire dataset and then summarize on the manipulated fields. The manipulation and merging is no problem, but the groupby feature…
Dubblej
  • 107
  • 2
  • 3
  • 10
3
votes
2 answers

Summarize with conditions based on ranges in dplyr

There is an illustration of my example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo …
Vojtěch Kania
  • 143
  • 1
  • 9
3
votes
3 answers

Power BI/DAX: Filter SUMMARIZE or GROUPBY by added column value

because of confidential nature of data, I'll try to describe what I'm struggling with using some random examples. Let's say I have a fact table with invoices data in Power BI. I need to count number of distinct product ID's with sales over let's say…
Uzzy
  • 431
  • 1
  • 8
  • 16
3
votes
3 answers

count distinct levels of a data frame for groups based on a condition

I have the following DF x = data.frame('grp' = c(1,1,1,2,2,2),'a' = c(1,2,1,1,2,1), 'b'= c(6,5,6,6,2,6), 'c' = c(0.1,0.2,0.4,-1, 0.9,0.7)) grp a b c 1 1 1 6 0.1 2 1 2 5 0.2 3 1 1 6 0.4 4 2 1 6 -1.0 5 2 2 2 0.9 6 2 1 6 0.7 I…
Param
  • 47
  • 6
3
votes
1 answer

Writing a function to filter and summarize data into proportion table

I want to create a large proportion table that involves filtering out certain values based on one column and outputting the proportion of values equal to 0 and those greater than 0 in table. Here's an example of the data frame (df): ID a b …
Kfin
  • 59
  • 4
3
votes
2 answers

How to add secondary summary of previously grouped/summarized data for purposes of sorting in R with dplyr

I am plotting two groups - before and after Each group has 2 levels - up, down For each level I have calculated the summary stat, count I am trying to create new summary stat which is the total count of each level in the database, new_count …
E50M
  • 87
  • 1
  • 7
3
votes
1 answer

sum count across multiple variables

I feel like this should be very easy, but I can't get it to work. Data are the three columns, fourth column is what I am looking for that I can't get to render out: eg_data <- data.frame( id = c(1,1,1,2,2,3,3,3,3,3,3,4,4,5,5,5,5), date = c("11/1",…
Adam_S
  • 687
  • 2
  • 12
  • 24
3
votes
2 answers

Summarize data table individually for multiple columns

I am trying to summarize data across multiple columns automatically if at all possible rather than writing code for each column independently. I would like to summarize this: Patch Size Achmil Aciarv Aegpod Agrcap A 10 …
Kevin
  • 87
  • 4
3
votes
2 answers

counting the occurrence of substrings in a column in R with group by

I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column. I have some data e.g. ID String village 1 fd_sec, ht_rm, A 2 NA, ht_rm …