0

Which has more database performance impact?

Querying a short table (say 20 Million records) with filter on two columns

or

querying a long table twice as more records than the short table with filter on one column?

To give some context:

I have a simple non-directed, connected graph about people and their relationships to other friends/people. The BI end user wants to search for a "person", and then use the fact table to identify all the related people/friends.
I am building a fact table (Star Schema) to save the information of the graph. The grain of my fact table is a Single relationship at one degree (direct relationship), supported by Date and a Person dimension. The person of interest can be in Node_1_person or Node_2_Person column.

Table Definitions:

RELATIONSHIP FACT TABLE (~40 Million Rows)
 Relationship_ID
 Node_1_person
 Node_2_Person
 Relationship_Strength_Score
 Relationship_Counter
 Relationship_Created_Date


Person_Dimension  (~20 Million Rows)
 Person_Surrogate_Key
 Person_Natural_key
 Person_Name
 Person_Address
 Person_Email

I have two options to facilitate the requirement:

Option #1: I save the reciprocal of the relationship, so the end user can search one column to find the relationship.
Option #2: I avoid reciprocal relationships, and then the end user has to look into both Node_1_person and Node_2_Person column.

I not sure, which option will have more performance impact to the database.

TheGameiswar
  • 27,855
  • 8
  • 56
  • 94

1 Answers1

0

Why not test both queries? Depending on how fast your database is, testing shouldn't take too long..

fstam
  • 669
  • 4
  • 20
  • Unfortunately, I don't have access to the DB environments. Since I have both option#1(Reciprocal) and Option #2 (non-reciprocal), I need know the pros and cons with respect to query performances, to justify the model and my decision. – user2600421 May 15 '17 at 17:26
  • But I imagine you have access to and cooperate with people that do. Why not prepare 2 queries and set up a test? Lets be reasonable, if you can't test, you can't be blamed for performance issues either. – fstam May 15 '17 at 17:31