0

Suppose I have tables as indicated below. I'm trying to create correct fact tables.

If I combine both tables into one fact table, I would be repeating the sales measure since each sale contains at least two items (admin fee, cleaning and so on - number of items can vary) that contribute to the overall sale. For car 1, the overall sale would therefore be 5000 + 50 + 100. I cannot simply aggregate either because sales needs to be itemized.

Do I need to create two fact tables, or do I need to approach this in an altogether different manner? I'm tempted to link Car Sale Table to Charge Table in a 1:n fashion.

Do you have any suggestions?

Car Sale Table

car_id sale currency
1 5000 USD
2 7000 USD

Charge Table

charge_id type amount currency car_id
14 admin fee 50 USD 1
15 cleaning 100 USD 1
22 parking 10 USD 2
25 cleaning 70 USD 2
Fisk
  • 227
  • 1
  • 12

2 Answers2

1

To denormalize the measures (here sale) is a very bad idea (as you noted) this will cause the sum calculation to fail.

But you do not need to combine you two fact tables in this way.

If your main goal is to calculate summaries per car, simple add a new charge_id (say 100 - with type = sale) and add the data to the second table.

So the fact table will contain three rows for the car_id = 1

charge_id   type        amount
14          admin fee   50
15          cleaning    100
100         sales       5000

The first table will be not needed.

The total cost calculation will be a plain aggregation of the amount per car_id.

You'll want to add some other attributes such as booking and validity timestamps.

Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
  • Any sort of dimensional exercise should be denormalized as best practice, not sure what "denormalize the measures" exactly means? Moreover, why would your fact contain a dimension attribute such as type? Quick read: https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/ or https://en.wikipedia.org/wiki/Dimensional_modeling. Also, why are you adding a dimension key such as charge_id = 100, wouldn't this be another use case for a dimension? I think the main goal here is to create a legitimate dimensional data model? – Jatin Morar May 16 '23 at 14:26
  • You're certainly onto something, especially considering there's only one option here which is to keep one fact at the line item of a transaction per OP's requirement. Transaction meaning purchase of car + admin fee + etc + etc to then be aggregated however. What throws me off is why your approach is this one size fit all to a fact table. Why not encourage a modeling effort via conformed dimensions? To add and make up a charge_id, then add it into the Fact seems like bad practice. If the model scales or needs to be maintained this will pose a problem. I think your solution is myopic at best. – Jatin Morar May 16 '23 at 17:12
  • To be fair maybe the OP should provide the underlying model as I do not know how these two "fact" tables were originated. – Jatin Morar May 16 '23 at 17:13
0

If you are creating a Fact_Sales_Table you should denormalize the data and combine. Data modeling is somewhat subjective and based on the requirements.

If you want one sales fact table perhaps think of a Sales Type Dimension that describes the itemized line item sale. Therefore, in your main fact table you would only really include the sales amount of each line item per overall car transaction.

Two fact tables seem unnecessary unless there is a need for different grain or requirement for two different fact tables?

Maybe try this as part of your modeling effort:

enter image description here

Also maybe change dim_sale to another name (I just threw something together)? I'm going this route as I assume there's more to the data than the tables/fields you provided.

Another example with the idea of adding the "car" transaction as a type:

enter image description here

Jatin Morar
  • 164
  • 1
  • 7