Questions tagged [summarize]

A dplyr instruction ( actually named summarise( ) ) to create a new data frame by grouping data according to given grouping variables. Use this tag along with the dplyr version being used. Mind the spelling in the method name.

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row (or more, as of 1.0.0) summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

836 questions
0
votes
1 answer

Get Proportion Graph From Summarise and Facet Wrap

I have three categorical variables and one numeric variable; I want to show proportions by segmenting the data based on my categorical variables and getting the proportions of the numeric variable. The data is as follows: ID Brand Color Gear…
0
votes
2 answers

How can I optimize the dplyr code by group if all calculations are the same

I have the following data frame, which is a subset of a much larger one containing over 3 million rows. df <- data.frame(Group = c(1,1,1,2,2,3,3,3,2,2,4,4,1,4,1,3,1,3,2,4,2,1,3,2,4), SubGroup =…
Dfeld
  • 187
  • 9
0
votes
1 answer

Recreate dplyr summarise in data.table

Just out of curiosity, is there a way of recreating the summary output using data.table instead of dplyr? dt1 <- data.table( uid=c("A00111", "A00112","A00113","A00211","A00212","A00213","A00214","A00311","A00312"), area=c("A001",…
Chris
  • 1,197
  • 9
  • 28
0
votes
1 answer

Creating new variable with summary values based on group

I really have two questions. I am quite certain that the second one would help me solve the first one, but I might be on the wrong track altogether and there might be simpler solutions. First question: I would like to make a stacked bar chart using…
Tea Tree
  • 882
  • 11
  • 26
0
votes
0 answers

dplyr summarize using a reference column for values

I'm trying to perform a simple summarize operation using a trimmed mean by referencing a trim-value column. I keep getting length errors and I cannot understand why this is not working. Maybe I'm missing something obvious? I don't need any special…
qab
  • 1
  • 1
0
votes
3 answers

How can I collapse multiple columns and to generate new variables from the different levels/values that were collapsed?

I have a dataset (df) similar to this one: df <- data.frame("ID"=c(1, 1, 1, 2, 2), "Method of payment"=c("cash","liabilities", "shares", "cash", NA), "USD"=c(110, 130, 200,…
Esperanta
  • 83
  • 7
0
votes
1 answer

Remove duplicates based on second column

I am trying to write a section of code that does a few things: 1) group dataset by ID 2) count the number of unique months in column data.month 3) remove all IDs that have less than 9 months 4) print distinct IDs based on the company (ie print…
Cae.rich
  • 171
  • 7
0
votes
1 answer

Is there a way to auto summarize bulk data?

I am want to be able to take "bulk" material lists and have them automatically summarized to "sum" like-with-like items. For example, would there be an efficient way to accomplish the following? Is there an efficient way to get from: to. . . ? Your…
Bryan M
  • 21
  • 1
  • 7
0
votes
1 answer

Grouping by multiple factors and summarizing counts of factors

I have a bunch of categorical ship "Type" data, e.g. passenger, fishing, cargo etc. within different distances offshore (DOS, e.g 0-12 nm, 0-25 nm etc.) for different months of the year. Initially I want to get a count of the number of Type, e.g.…
Lmm
  • 403
  • 1
  • 6
  • 24
0
votes
1 answer

compute the breteau index with R

Here is a data frame I have and I want to compute the breteau index ## Here is the table commune container house aegypti albopictus yde4 c1 h1 6 6 yde2 c2 h2 2 3 yde7 c3 h3 …
0
votes
2 answers

R question: shapiro.test function not working in dplyr::summarize while other summary functions do

When I try to use shapiro.test as a summary function on my R DataFrame I get the error: df %>% summarize_all(shapiro.test) Error: Column `A` must be length 1 (a summary value), not 4 Here is my setup: df = data.frame(A=sample(1:10,5),…
abalter
  • 9,663
  • 17
  • 90
  • 145
0
votes
2 answers

Calculations with dplyr based on specific factors and dates and summaries of values

I have a data frame of counts of different classifications of ship on specific dates at certain distances off shore (DOS), e.g. 0-12nm and 0-100nm - I would like to subtract the ships within the 0-12nm DOS from 0-100nm, so that I can calculate how…
Lmm
  • 403
  • 1
  • 6
  • 24
0
votes
0 answers

How to work with dependencies when grouping/summarising over multiple columns?

I'm trying to summarise multiple columns in a data frame using dplyr's group_by/summarise. If there is a dependency on an earlier column in one of the later columns, summarise uses the already summarised values. Is there a way to avoid this…
Tom
  • 532
  • 3
  • 11
0
votes
3 answers

How to summarize leave to without public holidays?

Hi all (before holidays). In this case I have added new leave in this table: +--------+---------+---------+-------------+----------+-------------------------- |ID_LEAVE|ID_WORKER| FNAME | LNAME | BEGIN_DATE | END_DATE |…
Prochu1991
  • 443
  • 5
  • 20
0
votes
2 answers

Gensim summarization returning repeated lines as summary of text documents

I am getting repeated lines in my summarizer output. I am using genism in python for summarizing text documents. How to remove duplicate lines from the output of the summarizer. The output is coming with repeated content. How can I only keep unique…
checkmate
  • 133
  • 1
  • 1
  • 9