Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
2
votes
1 answer

dplyr code "df %>% group_by(date = cut(date, breaks = "1 hour"))" no longer produces the desired result?

I have been using the following dplyr code to generate hourly averages from 1-minute time-series data. The code has been working for months, but has recently been producing some problematic results. Has something changed with any of the following…
philiporlando
  • 941
  • 4
  • 19
  • 31
2
votes
1 answer

Summarise and list custom index in dplyr

I am trying to output grouped summary variables with a corresponding list of identifying variables. Using the dplyr::starwars dataset as an example, I would like to calculate number of characters with "light" skin color, grouped by gender, with a…
Raoul Duke
  • 435
  • 3
  • 13
2
votes
2 answers

Summarize and generate multiple variables in a loop

I am looking for an effective way to manipulate multiple variables within a data frame. Right now I am using dplyr, but this becomes cumbersome with more variables. Suppose I have the following data frame, where brd is a car-brand, ye is a year,…
Franzi
  • 21
  • 5
2
votes
2 answers

Summarize data in R

I have a dataset which contains weekly sale of various products by outlet. Here is how the data looks like: Store ID Week ID Item Code Sales in $ 253422 191 41130 2.95 272568 188 41130 2.95 272568 188 41160 2.95 272568 189 41130 …
Rnovice
  • 333
  • 1
  • 5
  • 18
2
votes
0 answers

sum() down columns by subject dplyr

I'm trying to use dplyr to summarize some data and can't work out how to sum values from part of a column. Normally I'd use tally(), but in this case I want to add up all of the 1's and 0's so tally() isn't appropriate. My data looks something like…
Catherine Laing
  • 475
  • 6
  • 18
2
votes
1 answer

group-wise summaries/subsets dplyr

I have a data set of two courses in 2 different semesters that takes the following form: set.seed(200) sem <- sample(c("1", "2"), 200, replace = T) course <- sample(c("1", "2"), 200, replace = T) d.gender = sample(c(0, 1), 200, replace = T, prob =…
slap-a-da-bias
  • 376
  • 1
  • 6
  • 25
2
votes
2 answers

R Table with variables x levels

I have a dataframe with multiple variables, each has values of TRUE, FALSE, or NA. I'm trying to summarize the data, but get anything to work quite the way I want. names <- c("n1","n2","n3","n4","n5","n6") groupname <-…
2
votes
1 answer

Arithmetic on summarized dataframe from dplyr in R

I have a large dataset I use dplyr() summarize to generate some means. Occasionally, I would like to perform arithmetic on that output. For example, I would like to get the mean of means from the output below, say "m.biomass". I've tried this…
derelict
  • 3,657
  • 3
  • 24
  • 29
2
votes
1 answer

how to summarize numeric and factor level values simultaneously in R

I'm trying to summarize a dataset by grouping on one column (F1) and getting the average of the other columns, except that the other columns are split between numeric and factor levels. I can use ddply to summarize F2 numeric values but not sure how…
val
  • 1,629
  • 1
  • 30
  • 56
2
votes
1 answer

dplyr 'object not found' median only

This problem has me stumped. I have the following data frame: library(dplyr) # approximation of data frame x <- data.frame(doy = sample(c(seq(200, 300)), 20, replace = T), year = sample(c("2000", "2005"), 20, replace = T), …
Jaywalker
  • 49
  • 5
2
votes
4 answers

Extracting multiple rows for each ID based on a condition

I have a data frame with thousands of rows but a sample is given below: userid event 1 123 view 2 123 view 3 123 order 4 345 view 5 345 view 6 345 view 7 345 order 8…
syebill
  • 543
  • 6
  • 23
2
votes
1 answer

Aggregation in R to calculate percentage of total by group?

DDD <- summarise( group_by(Customers, Last_region, Last_state, Last_city), Count = length(Last_city), Total = sum(Customer.Value, na.rm = TRUE), Percent = sum(Customer.Value * 100 / sum(Customer.Value, na.rm = TRUE))) I have…
R.Mishra
  • 31
  • 1
  • 4
1
vote
2 answers

Create count table for specific condition and then add column that creates count by group as a whole in R

I have a table like this: data1 <- data.frame("State" = c("NJ", "NJ", "PA", "NJ", "TX"), "Filter" = c("Filter", "Filter", "No Filter", "Filter", "Filter"), "Threshold" = c("Exceeds","Exceeds", NA, "NL", "Exceeds")) I'd like to create a count table…
Sarah
  • 411
  • 4
  • 14
1
vote
1 answer

How to use Dplyr's summarize function to summarize specific columns using a list of functions

Problem statement I am writing a function right now that will aggregate (roll up) data at short time intervals up to longer time intervals. I am currently using dplyr and lubridate to accomplish this. The input dataframe to my function has a time…
Kyle Wolfe
  • 13
  • 4
1
vote
1 answer

Power BI - Show Sum of column in table but show average for each line

I have data for orders which have a row for each line item in the order, and I would like to summarise the data for each order by grouping the total of each order (by order number) and I then have another page to drill down into the line items of a…
gabri
  • 62
  • 8