Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
0
votes
1 answer

How to Calculate Summary Statistics (Standard Error, and Upper and Lower Confidence intervals) using the package data.table in R

Problem I have a data frame called FID (see below) and I am attempting to use the package data.table to summarize my data. I want to summarise my data by:- Desired Summarised Data frame Month Total frequency of FID per month over 3 years Mean…
Alice Hobbs
  • 1,021
  • 1
  • 15
  • 31
0
votes
2 answers

Sum different levels of a vector together within id

'My data could look like this id <- c('A1','A1','A1','A1','B2','B2','B2','B2','C3','C3','C3','C3') event <- c('a', 'b', 'c', 'd','a', 'b', 'c', 'd','a', 'b', 'c', 'd') value <- c(3,2,5,3,6,5,7,6,4,5,6,7) Dat <- data.frame(id, event, value) Now what…
timothy
  • 17
  • 3
0
votes
1 answer

How to use the summarise function to create a summary in R using dplyr package?

I have the following table which represents a child, his siblings and the case they are assigned under. The resource ids represent the house where they were placed together. child_id|sibling_id|case_id|resource_id|placed_together 1 8 …
Jay
  • 67
  • 4
0
votes
2 answers

Summarise a set of dates contained in a column

I have a dataset with transactions made from 2018/07/01 to 2019/06/30 and I want to find how many unique dates are in the "DATE" column (it has over 260k rows, so a date can be repeated several times). I have tried the following but it just lists…
AbetR
  • 5
  • 3
0
votes
2 answers

Aggregating firm specific data on an industry level based on SIC codes

I have ~250,000 rows of firm-specific annual data(2000-2019) with and industry SIC code for each firm. The aim is to sum the value in each variable column for every individual SIC code based on the year. The data looks like this for the first couple…
Lynne
  • 29
  • 3
0
votes
2 answers

How to print the minimum and maximum of factor level summary statistics (taking minimum and maximum of medians/proportions)?

I have data as follows, including 10 products (a, b, c, ...), and their descriptions (other variables). I need to report how the summary statistics of other variables (median/proportion) range between products (should be printed as a minimum and…
st4co4
  • 445
  • 3
  • 10
0
votes
1 answer

Error using summarize function in R using a custom statistic

I'm trying to calculate some statistic called FAR for each group in my data. I wrote a function for calculating the statistic, and it looks like this: FAR <- function(data){ FAs = sum(data$response %in% c(0,1) & data$correct_response=="No") NF =…
0
votes
1 answer

Group_by and summarize behave strangely and do not provide expected results

While having used dplyr before, I've run into problems that I do not sufficiently understand at the moment. The part of a research data set I am working with has +2500 different rows. These rows are different respondents of 515 houses from a…
Marvin
  • 1
0
votes
1 answer

summarize across -- is it order dependent?

I came across something weird with dplyr and across, or at least something I do not understand. If we use the across function to compute the mean and standard error of the mean across multiple columns, I am tempted to use the following…
vashts85
  • 1,069
  • 3
  • 14
  • 28
0
votes
1 answer

count number of times string appears in a column

Can you think about an intuitive way of calculating the number of times the word space appears in a certain column? Or any other solution that is viable. I basically want to know how many times the space key was pressed, however some participants…
CatM
  • 284
  • 2
  • 12
0
votes
2 answers

Using "summarise" (dplyr) with a function requiring a formula

I am trying to generate a table of regression slopes generated by a custom function based on the mblm package (the function in the example here is a simplified version). The function requires a formula as argument and I would like to use dplyr…
ThomasW
  • 1
  • 3
0
votes
2 answers

Adding totals to a data frame

I would like to add totals to my data frame but am having difficulties because the data is quite messy (as ever!) - some columns are text, some dates, some numeric. I can't post the actual data as it is sensitive but I will show a representative…
0
votes
1 answer

Tidyverse groups function for summarize?

I noticed that when using the group_by statement with summarize, I get a warning that the 'regrouping is being overridden by the .groups argument'. I found one article online that seems to indicate that a group_by statement is no longer necessary --…
wythe4
  • 83
  • 4
0
votes
1 answer

summarize() warning: argument is not numeric or logical: returning NA

I am trying to find the mean by year in my data using dplyr. I can't figure out why may code gives me NAs. Here is my code: S1b$Loans.and.discounts <- gsub(",","",S1b$Loans.and.discounts) S1b$Loans.and.discounts <-…
LouisEcon
  • 13
  • 4
0
votes
1 answer

How to calculate the percentage between two variables for specific observations in R?

I'm trying to calculate the incidence/percentage of a binary variable in relation to a variable that contains 5 (+ one NA) different income brackets. I'm using: afghan %>% group_by(income) %>% summarize(violent.exp.ISAF = n()) %>% …
D C
  • 3
  • 2