Questions tagged [aggregate]

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics.

Aggregate refers to the process of summarizing grouped data, commonly used in Statistics. Typically this involves replacing groups of data with single values (e.g. sum, mean, standard deviation, etc.). In SQL databases and data manipulation libraries such as in , this is accomplished with the use of GROUP BY and aggregate functions.

Documentation:

8256 questions
2
votes
1 answer

How to find ids on array who is created facet operator

I have Customer collection on MongoDB. With status field. Which can have the same Id fields. And I need find first changed value like 'Guest' and push it Id's to specific pipeline named as 'guests'. And customers with status 'Member' I need push tu…
2
votes
2 answers

Python / Pandas - Datetime statistics. How to aggregate means of datetime columns

i am currently writing a "Split - Apply - Combine" pipeline for my data analysis, which also involves dates. Here's some sample data: In [1]: import pandas as pd import numpy as np import datetime as dt startdate =…
flurble
  • 1,086
  • 7
  • 21
2
votes
4 answers

Question about a funky array declaration

I just came across this array declaration: const int nNums= 4; int* nums[nNums] = {0, 0, 0}, d[nNums]; I understand that a pointer to nums is being created, but what is the business on the right? d[] gets initialized, but I am not quite sure what…
woopf
  • 83
  • 1
  • 3
2
votes
1 answer

Combining strings when aggregating in R without creating a list of lists

I have a set of data in the form of strings. They are text responses given to a question, and some people gave multiple responses to the question. I want to combine each individual's text responses, i.e. aggregate by individual. However, when I…
Jessica
  • 23
  • 4
2
votes
1 answer

using aggregate functions before joining the tables

I have two tables and joining them on customer_id. The first table is deal and I store the data of a deal there. And every deal has volume and rest , pay etc. The second table is handle and it's hard for me explain what is purpose of this table but…
sajadsholi
  • 173
  • 1
  • 3
  • 12
2
votes
0 answers

K-Means Python Syntax When Records Represented by a Cnt Column (in Aggregate)

Trying to accomplish K-Means in Python using aggregated data files. For example, instead of a data frame with 3 records represented by 3 rows, one row will represent all 3 with a column like cnt (arbitrarily named) representing those 3 unique…
Zach
  • 35
  • 2
  • 6
2
votes
3 answers

Python: Summarizing & Aggregating Groups and Sub-groups in DataFrame

I am trying to build a table that has groups that are divided by subgroups with count and average for each subgroup. For example, I want to convert the following data frame: To a table that looks like this where the interval is a bigger group and…
user9532692
  • 584
  • 7
  • 28
2
votes
0 answers

Programmatically add Aggregate Transform - Count Distinct to SSIS package

I am working on programmatically creating Aggregate transform with aggregation type as count distinct and i am able to create other aggregations like min,max,count.. but when it comes to count distinct i am getting below error The component has…
sam
  • 345
  • 2
  • 4
  • 18
2
votes
0 answers

Is there a way to aggregate orders within a certain time span in python?

I am a fairly new in operating with python. There is a table of orders with specific time flag. However, the "correct" order was split to many rows since orders may be processed at different time and thus has different order ids. The final goal is…
Stack_Javi
  • 23
  • 4
2
votes
2 answers

Group by, aggregate, include separate column

Here's my data: foo = pd.DataFrame({ 'accnt' : [101, 102, 103, 104, 105, 101, 102, 103, 104, 105], 'gender' : [0, 1 , 0, 1, 0, 0, 1 , 0, 1, 0], 'date' : pd.to_datetime(["2019-01-01 00:10:21", "2019-01-05 00:09:18", "2019-01-05 00:09:30",…
Vishesh Shrivastav
  • 2,079
  • 2
  • 16
  • 34
2
votes
1 answer

SQL Server: how can I get the correct DB size from sys.master_files?

I'm working on a query that gathers some information related to DB restores, and I'm having trouble getting the correct DB size. The following query provides me with the DB name, last restore date, DB size, and user name of the last person who…
2
votes
1 answer

df.groupby() - how to aggregate data where order of grouped data is important?

How can I aggregate data when the order of grouped data is important? (bonus points if this can be done in an elegant vectorized way). If that was clear as mud, let me explain with an example. Let's say I have data in df: id month …
Ben
  • 139
  • 5
2
votes
2 answers

Collapsing a data.frame by group and interval coordinates

I have a data.frame which specifies linear intervals (along chromosomes), where each interval is assigned to a group: df <- data.frame(chr = c(rep("1",5),rep("2",4),rep("3",5)), start = c(seq(1,50,10),seq(1,40,10),seq(1,50,10)), …
dan
  • 6,048
  • 10
  • 57
  • 125
2
votes
3 answers

Calculate means across elements in a list

I have a list like this: (mylist <- list(a = data.frame(x = c(1, 2), y = c(3, 4)), b = data.frame(x = c(2, 3), y = c(4, NA)), c = data.frame(x = c(3, 4), y = c(NA, NA)))) $a x y 1 1 3 2 2 4 $b x y 1 2 4 2 3…
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
2
votes
2 answers

group_by and count number of elements in each column in R

I have a data table like below: city year t_20 t_25 Seattle 2019 82 91 Seattle 2018 0 103 NYC 2010 78 8 DC 2011 71 0 DC 2011 0 0 DC …
OverFlow Police
  • 861
  • 6
  • 23