Which has more database performance impact?
Querying a short table (say 20 Million records) with filter on two columns
or
querying a long table twice as more records than the short table with filter on one column?
To give some context:
I have a simple non-directed, connected graph about people and their relationships to other friends/people. The BI end user wants to search for a "person", and then use the fact table to identify all the related people/friends.
I am building a fact table (Star Schema) to save the information of the graph. The grain of my fact table is a Single relationship at one degree (direct relationship), supported by Date and a Person dimension. The person of interest can be in Node_1_person or Node_2_Person column.
Table Definitions:
RELATIONSHIP FACT TABLE (~40 Million Rows)
Relationship_ID
Node_1_person
Node_2_Person
Relationship_Strength_Score
Relationship_Counter
Relationship_Created_Date
Person_Dimension (~20 Million Rows)
Person_Surrogate_Key
Person_Natural_key
Person_Name
Person_Address
Person_Email
I have two options to facilitate the requirement:
Option #1: I save the reciprocal of the relationship, so the end user can search one column to find the relationship.
Option #2: I avoid reciprocal relationships, and then the end user has to look into both Node_1_person and Node_2_Person column.
I not sure, which option will have more performance impact to the database.