0

I am currently working on the Maxmind database. However, it contains hundreds of thousands of duplicate entries. For example:

Newcastle Upon Tyne:

369137  GB  I7  Newcastle Upon Tyne NE20    54.9881 -1.6194
369332  GB  I7  Newcastle Upon Tyne NE6 54.9881 -1.6194
369345  GB  I7  Newcastle Upon Tyne NE13    54.9881 -1.6194
369355  GB  I7  Newcastle Upon Tyne NE3 54.9881 -1.6194
369356  GB  I7  Newcastle Upon Tyne NE5 54.9881 -1.6194
369645  GB  I7  Newcastle Upon Tyne NE4 54.9881 -1.6194
369706  GB  I7  Newcastle Upon Tyne NE15    54.9881 -1.6194
369959  GB  I7  Newcastle Upon Tyne NE12    54.9881 -1.6194
370114  GB  I7  Newcastle Upon Tyne NE27    54.9881 -1.6194

Newcastle (I removed some of them because there's too many to paste here):

382 ZA  2   Newcastle       -27.758 29.9318
2279    US  OK  Newcastle   73065   35.2323 -97.6008
26459   US  CA  Newcastle   95658   38.873  -121.1543
22382   CA  ON  Newcastle   l1b1j9  43.9167 -78.5833
38995   AU  2   Newcastle       -32.9278    151.7845
40025   US  ME  Newcastle   4553    44.0438 -69.5675
47937   GB  I7  Newcastle       54.9881 -1.6194
119830  US  ME  Newcastle   4553    44.0438 -69.5675
119982  US  NE  Newcastle   68757   42.6475 -96.9232
115052  US  CA  Newcastle   95658   38.873  -121.1543
120603  US  NE  Newcastle   68757   42.6475 -96.9232
127931  US  OK  Newcastle   73065   35.2323 -97.6008
136726  CA  ON  Newcastle       43.9167 -78.5833
136915  US  TX  Newcastle   76372   33.245  -98.9103
137128  US  WY  Newcastle   82701   43.8396 -104.5681
137130  US  WY  Newcastle   82701   43.8396 -104.5681

Given that there are more than one Newcastle cities in the world and also that it returns all the different postcodes for Newcastle even if it contains the same lat/lon, how can we remove the duplicate entries?

I have taken a look at this Eliminate duplicate cities from database which comes up with this potential solution:

delete from climate.maxmind_city mc where id in (
select
  max(c1.id)
from
  climate.maxmind_city c1,
  climate.maxmind_city c2
where
  c1.id <> c2.id and
  c1.country = c2.country and
  c1.name = c2.name and
  earth_distance(
    ll_to_earth( c1.latitude_decimal, c1.longitude_decimal ),
    ll_to_earth( c2.latitude_decimal, c2.longitude_decimal ) ) <= 35
group by
  c1.country, c1.name
order by
  c1.country, c1.name
)

However, earth_distance is a postgresql function and we are using MySQL. How can I therefore replace the earth_distance function with a similar MySQL approach?

Community
  • 1
  • 1
Johnathan Au
  • 5,244
  • 18
  • 70
  • 128
  • If the only thing you are asking for is calculating distance between 2 points in MySQL then try this: http://www.tech-problems.com/calculating-distance-in-mysql-using-spatial-point-type/ – Michal Gasek Jan 21 '14 at 11:49
  • Thanks for the reply but how can I incorporate SQRT(POW(69.1 * (fld_lat - ( $lat )), 2) + POW(69.1 * (($lon) - fld_lon) * COS(fld_lat / 57.3 ), 2 )) AS distance into my query? I don't want to be creating new tables. – Johnathan Au Jan 21 '14 at 12:20
  • You can create a your own function/procedure in MySQL. – Michal Gasek Jan 21 '14 at 12:21
  • If you only want cities and their coordinates the `Maxmind`database is not suitable as it is for ip addresses and there coordinates. If you remove the duplicates it is of no use for ip's. – david strachan Jan 21 '14 at 13:49

0 Answers0