Determine if a measure is aggregable or not in data-warehouse

Question

Can someone please tell me a 'general use', step-by-step method to find out what are the non-aggregability fields in a datamart. Here an example I found:

Notes: italic means 'key', bold identifies 'shortenings', 'column' is an alias for 'referencing'

Relational schema:

CALL(COD,DATE,FROM:S,TO:S,LEN)

SIM(SIM, USER:USER, TRIFF:T, BONUS)

TARIFF(TARIFF, CARRIER:CAR)

USER(USER, TOWN:TOW, LAST_TARIFF:TAR)

ROAMING_CALL(COD:CAL, FOREIGN_CARRIER:CAR)

PROMO_CALL(COD:CAL, PROMO_TARIFF:P_TA)

PROMO_TARIFF(TARIFF:TAR)

TOWN(TOWN,NATION)

CARRIER(CARRIER, NATION)

REQUESTS: build a fact-schema for 'CALL' with following

dimensions: DATE, SIM_FROM, CALLED_CARRIER, FOREIGN_CARRIER, PROMO_TARIFF and

measures: AVG_CALL_LENGTH, NUM_OUTGOING_SIM (as count distinct FROM), NUM_INCOMING_SIM (as count distinct TO)

Now I can draw the fact-schema, but I'm in trouble finding what measures are aggregable along which dimensions

EDIT: this is the pdf for the fact schema I have (sorry for not using a strict sintax but reading notes are included)

Measures:

Standard [obtained by the operational schema]:  
NUM_INCOMING_CALLS = COUNT DISTINCT (TO)    
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*

Calculated [obtained by the operational schema, need partial data to add properly]:  
AVG_CALL_LENGTH = CL_SUM/CL_COUNT  
where  
CL_SUM = SUM (LENGTH), CL_COUNT = COUNT(LENGTH)  
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*

Derived  [can be found as a dimension]:  
NUM_OUTGOING_CALLS = COUNT DISTINCT ( FROM )  
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*

Please explain what you understand & what you have tried here. Read & act on hits googling 'stackexchange homework'. You are asking for a textbook chapter/section & personalized tutorial. Please find & follow a text & its examples & ask a question about where you first get stuck. — philipxy, Feb 07 '18 at 10:31
I have no suggested texts (only slides with few examples and notes taken at lessons), this exercise is taken from my professor's lecture notes and the procedure isn't very clear. — DDS, Feb 07 '18 at 11:06
I think your professor is getting at "average call length" here. If you have a field holding the 'average call length' in a record in a table, you won't get a sensible figure if you just add it up. — Nick.Mc, Feb 07 '18 at 12:52
No, that may be a wrong solution, as an 'averaged' field get errors if added at different levels of granularity. I really need two fields: eg: i have these length: AB:2', AC:4', BC:3', BD:6'. the average call of lenght is 3,75', if I group by 'caller' I have A: (2+4)/2 = 3, B=(3+6)/2 = 4,5, and (3+4,5)/2= 7,5 that isn't correct as the 'whole mean' should be 3,75 so adding a unique field is wrong. — DDS, Feb 07 '18 at 13:48
An example of what I'm trying figuring out issue is: how can I determine if is correct NOT TO group by promo tariff? -> Promo tariff has ALWAYS to be in the pattern. — DDS, Feb 07 '18 at 13:59
You determine whether it is correct or not by the meaning of the dimension. Think to yourself, 'would it be right to sum or average the measure I'm choosing' by this dimension? But at the moment this question isn't really answerable as it isn't clear what you're asking or what your issue is. — Rich, Feb 07 '18 at 16:07
I may have to deal with data that I don't understand / written in another language, so 'doing what seems to be good' is not an option. My question was on a theoretical point: what is the 'general', if possible, one size-fits all, method to determine if an aggregation is meaningful or not, having some, given or inferred, functional dependancies? My professor wrote something like: because TO -> {CALLED_CARRIER} NA = {DATE, FROM, FOREIGN_CARRIER, PROMO_TARIFF} But, as told, I can't figure out how he found this. — DDS, Feb 07 '18 at 16:18
*Any* aggregate (like any query expression) means something. The question is, do you *want* a *particular* aggregation. An FD says a column set is a function (single-valued) per others. But FDs & other constraints are not needed to query & don't affect query meanings. They reflect the range of inputs & results. You might as well ask whether the product of two numbers always has a meaning. It gives you a value. What that value represents depends on what the inputs represent, which depends on your application. And whether it is useful depends on what interests you about your application. — philipxy, Feb 08 '18 at 04:42
An FD determining a column is relevant when you make a fact table (data (hyper)cube) from a relation by having the determined column be the source of elements of the cube and the determining attributes being among the subsets that form the axes. (Although without a FD you can still make a cube that has relations as elements.) Then you can apply cube aggregation. (Any query you can write on a cube can be expressed in terms of the relation it represents.) PS How come your clarifying comment mentions FDs but your question neither gives nor mentions them? — philipxy, Feb 08 '18 at 06:17
Re my 1st comment: Zillions of presentations including textbook pdfs are a google away. Moreover it is extremely unlikely your course is without reading material. Also, clarify via editing your post, not comments. Also please read & act on [mcve]. Including [using text, not images/links, for text (including code, tables & ERDs)](https://meta.stackoverflow.com/q/285551/3404097). Use an image only for convenience to supplement text and/or for what cannot be given in text. Use edit functions to inline, not link, if you have the rep--make your post self-contained. — philipxy, Feb 08 '18 at 06:25
The listed dimensions & measures just seem to be arbitrary choices your prof made for an example fact table holding data that is in or can be derived from those tables. (Unfortunately you are not clear, either re what you were given or what you do or don't understand about it. "finding what measures are aggregable along which dimensions" is not clear. You have not explained what your "issue" is. Restrict yourself to full sentences instead of fragments. Clearly explain your path to where you are stuck. Also your "schema" is a mess of unexplained boldface, italics & colons--for PKs & FKs?) — philipxy, Feb 10 '18 at 09:24

score -1 · Accepted Answer · edited Jun 20 '20 at 09:12

-1

Ok, I went and asked my teacher: he gave me a easy algorithm:

Given a schema D{D1, D2, D3, ... Dn}, for a Mesaure M= count distinct A n

if A U X -> Di is not trivial, X subset of D

X U A -> D1 (True)
X U A -> D2 (False)
X U A -> D3 (True)
...
X U A -> Dn-1 (False)

I have that NA = {D2, Dn-1}
NA: set of non-aggregabilities

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 10 '18 at 13:59

DDS

2,340
16
34

Does this tell you which measure is non aggregable? – Nick.Mc Feb 11 '18 at 09:39
Definetly yes: just apply this algorithm for each measure and each dimension. If you find that a dimension cam be determined by 'expansed' measure's argument it's aggregable, if cannot be determined, so it's not-aggregable. – DDS Feb 11 '18 at 12:02
I can say with authority that this has _zero_ real world application – Nick.Mc Feb 12 '18 at 03:20
1

Please explain what is going on here. Also in the question. Also clarifications should be edited into posts, not put in comments. – philipxy Feb 13 '18 at 23:49

Determine if a measure is aggregable or not in data-warehouse

1 Answers1