0

I have 2 tables city_sessions_1 and city_sessions_2 Structure of both table are similar

CREATE TABLE `city_sessions_1` (
  `city_id` int(11),
  `session_date` date,
  `start_time` varchar(12),
  `end_time` varchar(12) ,
  `attendance` int(11) ,  
  KEY `city` (`city_id`),
  KEY `session_date` (`session_date`)
) ENGINE=MyISAM;

Note these tables do not have any primary key, but they have their indexes defined. Both tables have same number of rows. But it is expected that some data would be different.

How can I compare these 2 tables' data?

Imdad
  • 5,942
  • 4
  • 33
  • 53
  • what out put do you want – Muhammad Raheel Sep 04 '12 at 12:10
  • Are you after the [intersection](http://en.wikipedia.org/wiki/Intersection_(set_theory)) (what's the same) or the [symmetric difference](http://en.wikipedia.org/wiki/Symmetric_difference) (what differs)? – eggyal Sep 04 '12 at 12:34

1 Answers1

1
-- We start with the rows in city_session_1, and their fit in city_session_2
SELECT
  * -- or whatever fields you are interested in
FROM city_sessions_1 
  LEFT JOIN city_sessions_2 ON city_sessions_1.city_id=city_sessions_2.city_id
WHERE 
  -- Chose only those differences you are intersted in
  city_sessions_1.session_date<>city_session_2.session_date
  OR city_sessions_1.start_time<>city_session_2.start_time
  OR city_sessions_1.end_time<>city_session_2.end_time
  OR city_sessions_1.attendance<>city_session_2.attendance

UNION 

-- We need those rows in city_session_2, that have no fit in city_session_1
SELECT
  * -- or whatever fields you are interested in
FROM city_sessions_2 
  LEFT JOIN city_sessions_1 ON city_sessions_1.city_id=city_sessions_2.city_id
WHERE city_sessions_1.city_id IS NULL
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
  • You can change `UNION` to `UNION ALL` to speed things up. +1 anyway Yes, a minor tweak, but a tweak nonetheless. – fancyPants Sep 04 '12 at 12:48
  • @tombom The `UNION` is guaranteed to not produce duplicates, the first half having city_sessions_1.city_id allways `not null`, the second half allways `null`. This is found immediately in the deduplication process (city_sessions_1 being the driving table of the first query), resulting in close-to-zero overhead. – Eugen Rieck Sep 04 '12 at 13:46