Questions tagged [grouping]

The process of grouping entities into collections of associated elements.

Grouping is a form of hierarchical knowledge representation, similar to mind mapping, concept mapping and argument mapping, all of which need to observe at least some of the principles of grouping.

SQL Server - Indicates whether a specified column expression in a GROUP BY list is aggregated or not. Read more about this

Wiki Links

7381 questions
9
votes
5 answers

Group numeric vector by predefined maximal group sum

I have a numeric vector like this x <- c(1, 23, 7, 10, 9, 2, 4) and I want to group the elements from left to right with the constrain that each group sum must not exceed 25. Thus, here the first group is c(1, 23), the second is c(7, 10) and…
LulY
  • 976
  • 1
  • 9
  • 24
9
votes
4 answers

Aggregating sequential and grouped data in R

I have a dataset that looks like this toy example. The data describes the location a person has moved to and the time since this relocation happened. For example, person 1 started out in a rural area, but moved to a city 463 days ago (2nd row), and…
Joshua
  • 722
  • 12
  • 27
9
votes
3 answers

Count of number of elements between distinct elements in vector

Suppose I have a vector of values, such as: A C A B A C C B B C C A A A B B B B C A I would like to create a new vector that, for each element, contains the number of elements since that element was last seen. So, for the vector above, NA NA 2 NA …
richarddmorey
  • 976
  • 6
  • 19
9
votes
5 answers

MySQL: How to group data per hour and get the latest hour

I'm trying to do a query that fetches data per hour but instead of the normal group by hour I want to narrow it down and only get the latest hour - meaning the newest data within that hour. With the picture shown below what I wanted to get is the…
tradyblix
  • 7,439
  • 3
  • 25
  • 29
9
votes
6 answers

Identify and count spells (Distinctive events within each group)

I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too…
Thomas Speidel
  • 1,369
  • 1
  • 14
  • 26
9
votes
5 answers

How to efficiently group pairs based on shared item?

I have a list of pairs (tuples), for simplification something like this: L = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")] Using python I want efficiently split this list to: L1 = [("A","B"), ("B","C"),…
Miro
  • 599
  • 10
  • 29
9
votes
5 answers

Calculate 95th percentile of values with grouping variable

I'm trying to calculate the 95th percentile for multiple water quality values grouped by watershed, for example: Watershed WQ 50500101 62.370661 50500101 65.505046 50500101 58.741477 50500105 71.220034 50500105 57.917249 I reviewed…
9
votes
4 answers

How to correctly use group_by() and summarise() in a For loop in R

I'm trying to calculate some summary information to help me check for outliers in different groups in a dataset. I can get the sort of output I want using dplyr::group_by() and dplyr::summarise() - a dataframe with summary information for each group…
Mark_1
  • 331
  • 1
  • 3
  • 16
9
votes
1 answer

data.table do not compute NA groups in by

This question has a partial answer here but the question is too specific and I'm not able to apply it to my own problem. I would like to skip a potentially heavy computation of the NA group when using by. library(data.table) DT = data.table(X =…
JRR
  • 3,024
  • 2
  • 13
  • 37
9
votes
2 answers

JS group by month of date values (objects) in an array

My array is like this: myArray = [ {date: "2017-01-01", num: "2"} {date: "2017-01-02", num: "3"} {date: "2017-02-04", num: "6"} {date: "2017-02-05", num: "15"} ] I want to convert this into: myArray = [ {group: "0", data: [ {date:…
miri.copé
  • 91
  • 1
  • 1
  • 2
9
votes
2 answers

Python pandas dataframe: find max for each unique values of an another column

I have a large dataframe (from 500k to 1M rows) which contains for example these 3 numeric columns: ID, A, B I want to filter the results in order to obtain a table like the one in the image below, where, for each unique value of column id, i have…
TuoCuggino
  • 365
  • 1
  • 4
  • 13
9
votes
4 answers

Python: group a list into sublists by a equality of projected value

Is there a nice pythonic way of grouping a list into a list of lists where each of the inner lists contain only those elements that have the same projection, defined by the user as a function? Example: >>> x = [0, 1, 2, 3, 4, 5, 6, 7] >>> groupby(x,…
zegkljan
  • 8,051
  • 5
  • 34
  • 49
9
votes
5 answers

Java 8 Stream API - Selecting only values after Collectors.groupingBy(..)

Say I have the following collection of Student objects which consist of Name(String), Age(int) and City(String). I am trying to use Java's Stream API to achieve the following sql-like behavior: SELECT MAX(age) FROM Students GROUP BY city Now, I…
Ghost93
  • 175
  • 1
  • 2
  • 12
9
votes
0 answers

Openfire contact list sharing

I installed openfire on CentOs and it uses external database for authentification and user list. I managed groups based on a user table and a friend listing so that each user is also a group that is is an administrator and populated with users that…
Kassav'
  • 1,130
  • 9
  • 29
9
votes
1 answer

Spark DataFrame: operate on groups

I've got a DataFrame I'm operating on, and I want to group by a set of columns and operate per-group on the rest of the columns. In regular RDD-land I think it would look something like this: rdd.map( tup => ((tup._1, tup._2, tup._3), tup) ). …
Ken Williams
  • 22,756
  • 10
  • 85
  • 147