0

I am working on task which require me to compare every column of Row. There are a number of ways to achieve, I am curious because number of rows is a big number. So here I am explaining it by example.

---------------------------------------------------------------------
ID[P_K] | Name | Address | City | Gender | College
---------------------------------------------------------------------

So above is a basic example of a table, which is hold data of Students from multiple colleges, now I am getting some data from outside source and need to compare it with data in my DB. Below are the possible ways to do it.

  1. I will do select query with where Id = <id> and match it one by one in my code.

  2. Other way I can do a select query with where ID = <id> and name = <name> and so on...

So now my preference is 2nd option, because of lesser complexity.

Now to go ahead there is only thing which is creating conflict in my mind.

Question:

Complexity of query these two queries comparing to each other(considering ID as Primary Key) :

  1. where Id = <id>

  2. where ID = <id> and name = <name> and so on...

I know this total depends on the MySQL algorithm, I've searched lot didn't find Select algorithm of MySql.

It will be helpful if someone can share Select algorithm.

Specific to Algorithm:

There are two ways this algorithm could work:

  1. For number of rows { if(whereCondition1 && whereCondition2 .... && whereCondition<N>)}
    }

  2. for number of rows { if(whereCondition1){ //Result filter according whereCondition1 if(whereCondition2){ //Result filter according whereCondition2 . . and so on... } else { continue; } } else { continue; } }

Now complexity for first one will be O(n). For second assuming ID[P_K], complexity will be reduced. Right?

So from above which algorithm is user? or non of these?

Ishan Dhingra
  • 2,442
  • 5
  • 31
  • 42
  • What are you trying to achieve? Do you want to compare table to itself? In other words check if one student is in more that one place? What is desired output? –  Jul 15 '14 at 12:14

2 Answers2

1

Every RDBMs has your own select algorithm but all of then is based on the SQL ANSI 99 pattern.

The matters here is how the RDBMs treats then and parse it to achieve a better performance. You don't need to worry about that, the only thing you have to worry is if your database is well designed with proper indexes.

That is what will make the difference between using the where Id = <id> or where ID = <id> and name = <name> and so on...

If the ID is the PK of that table and the external source that you talked about is syncronized with your data (meaning: The IDs has the same records) you just need to use where Id = <id> but if those IDs are not in sync you should then define what will make your registries unique and than create your sql condition and make sure that you have proper indexes for it.

Jorge Campos
  • 22,647
  • 7
  • 56
  • 87
  • This thing I know that only ID[PK] is enough to get the desired row. Problem is to check if out side db value matches with db value or not. I will make an edit now to better explain my point. – Ishan Dhingra Jul 15 '14 at 12:29
  • Then you go for my statement: `but if those IDs are not in sync you should then define what will make your registries unique` – Jorge Campos Jul 15 '14 at 12:31
  • There is ID is unique for sure, what I am checking is other columns which will take more time. – Ishan Dhingra Jul 15 '14 at 13:07
0

I don't know how exactly work the algorithm, but in general:

for(int i = 0, j = 0; i < N; ++i; ++j) {
   if (i==j) {
      /* do something */
   }
}

This has complexity O(N)

for(int i = 0, j = 0; i < N; ++i; ++j) {
   if (i==j && i!=k) {
      /* do something */
   }
}

Also this has complexity O(N)

Ultimately, point 1 and point 2 has got the same complexity.

Luca Davanzo
  • 21,000
  • 15
  • 120
  • 146