Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases and data manipulation libraries such as in , this is accomplished with the use of GROUP BY and aggregate functions.

Documentation:

8256 questions
18
votes
4 answers

Merging data in a single SQL table without a Cursor

I have a table with an ID column and another column with a number. One ID can have multiple numbers. For example ID | Number 1 | 25 1 | 26 1 | 30 1 | 24 2 | 4 2 | 8 2 | 5 Now based of this data, in a new table, I want to have this ID…
Andrew Backes
  • 1,884
  • 4
  • 21
  • 37
17
votes
4 answers

Get min and max values of categorical variable in a dataframe

I have a dataframe that looks like this: D X Y Z A 22 16 23 A 21 16 22 A 20 17 21 B 33 50 11 B 34 53 12 B 34 55 13 C 44 34 11 C 45 33 11 C 45 33 10 D 55 35 60 D 57 34 61 E 66 36 13 E 67 38 14 E 67 37 …
IronMaiden
  • 552
  • 4
  • 20
17
votes
3 answers

When and how to use Aggregate Target in xcode 4

I was trying to look for an example of using an Aggregate Target in Xcode4, including its purpose and why a developer should use it. Do you have any reference link, especially from Apple Developer web site?
Leonardo
  • 9,607
  • 17
  • 49
  • 89
17
votes
2 answers

Pandas GroupBy.agg() throws TypeError: aggregate() missing 1 required positional argument: 'arg'

I’m trying to create multiple aggregations of the same field. I’m working in pandas, in python3.7. The syntax seems pretty straightforward based on the…
user3476463
  • 3,967
  • 22
  • 57
  • 117
17
votes
2 answers

Count totals by year and month

I have a table that looks like this: id,created,action 1,'2011-01-01 04:28:21','signup' 2,'2011-01-05 04:28:21','signup' 3,'2011-02-02 04:28:21','signup' How do I select and group these so the output is: year,month,total 2011,1,2 2011,2,1
Tom
  • 33,626
  • 31
  • 85
  • 109
17
votes
1 answer

Aggregate by week in R

In R I frequently aggregate daily data (in a zoo) by month, using something like this: result <- aggregate(x, as.yearmon, "mean", na.rm=TRUE) Is there a way that I can do this by week?
Richard Herron
  • 9,760
  • 12
  • 69
  • 116
17
votes
5 answers

Collapsing rows where some are all NA, others are disjoint with some NAs

I have a simple dataframe as such: ID Col1 Col2 Col3 Col4 1 NA NA NA NA 1 5 10 NA NA 1 NA NA 15 20 2 NA NA NA NA 2 25 30 NA NA 2 NA …
tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149
17
votes
4 answers

ElasticSearch returning only documents with distinct value

Let's say I have this given data { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" :…
user962206
  • 15,637
  • 61
  • 177
  • 270
17
votes
1 answer

ElasticSearch setup for a large cluster with heavy aggregations

Context and current state We are migrating our cluster from Cassandra to a full ElasticSearch cluster. We are indexing documents at average of ~250-300 docs per seconds. In ElasticSearch 1.2.0 it represents ~8Go per day. { "generic": { …
17
votes
3 answers

Performance of COUNT SQL function

I have two choices when writing an SQL statement with the COUNT function. SELECT COUNT(*) FROM SELECT COUNT(some_column_name) FROM In terms of performance, what is the best SQL statement? Can I obtain some performance…
Upul Bandara
  • 5,973
  • 4
  • 37
  • 60
17
votes
2 answers

Fastest way to count occurrences of each unique element

What is the fastest way to compute the number of occurrences for each unique element in a vector in R? So far, I've tried the following five functions: f1 <- function(x) { aggregate(x, by=list(x), FUN=length) } f2 <- function(x) { r <-…
Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
17
votes
3 answers

How to aggregate some columns while keeping other columns in R?

I have a data frame like this: id no age 1 1 7 23 2 1 2 23 3 2 1 25 4 2 4 25 5 3 6 23 6 3 1 23 and I hope to aggregate the date frame by id to a form like this: (just sum the no if they share the…
Nip
  • 359
  • 1
  • 3
  • 9
17
votes
2 answers

aggregating multiple columns in data.table

I have the following sample data.table: dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10)) I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the…
Alex
  • 19,533
  • 37
  • 126
  • 195
16
votes
1 answer

Update an entity inside an aggregate

I was reading a similar question on SO: How update an entity inside Aggregate, but I'm still not sure how a user interface should interact with entities inside an aggregate. Let's say I have a User, with a bunch of Addresses. User is the aggregate…
BenMorel
  • 34,448
  • 50
  • 182
  • 322
16
votes
10 answers

Why are SQL aggregate functions so much slower than Python and Java (or Poor Man's OLAP)

I need a real DBA's opinion. Postgres 8.3 takes 200 ms to execute this query on my Macbook Pro while Java and Python perform the same calculation in under 20 ms (350,000 rows): SELECT count(id), avg(a), avg(b), avg(c), avg(d) FROM tuples; Is this…
Jacob Rigby
  • 1,323
  • 2
  • 15
  • 20