I am currently working on the Maxmind database. However, it contains hundreds of thousands of duplicate entries. For example:
Newcastle Upon Tyne:
369137 GB I7 Newcastle Upon Tyne NE20 54.9881 -1.6194
369332 GB I7 Newcastle Upon Tyne NE6 54.9881 -1.6194
369345 GB I7 Newcastle Upon Tyne NE13 54.9881 -1.6194
369355 GB I7 Newcastle Upon Tyne NE3 54.9881 -1.6194
369356 GB I7 Newcastle Upon Tyne NE5 54.9881 -1.6194
369645 GB I7 Newcastle Upon Tyne NE4 54.9881 -1.6194
369706 GB I7 Newcastle Upon Tyne NE15 54.9881 -1.6194
369959 GB I7 Newcastle Upon Tyne NE12 54.9881 -1.6194
370114 GB I7 Newcastle Upon Tyne NE27 54.9881 -1.6194
Newcastle (I removed some of them because there's too many to paste here):
382 ZA 2 Newcastle -27.758 29.9318
2279 US OK Newcastle 73065 35.2323 -97.6008
26459 US CA Newcastle 95658 38.873 -121.1543
22382 CA ON Newcastle l1b1j9 43.9167 -78.5833
38995 AU 2 Newcastle -32.9278 151.7845
40025 US ME Newcastle 4553 44.0438 -69.5675
47937 GB I7 Newcastle 54.9881 -1.6194
119830 US ME Newcastle 4553 44.0438 -69.5675
119982 US NE Newcastle 68757 42.6475 -96.9232
115052 US CA Newcastle 95658 38.873 -121.1543
120603 US NE Newcastle 68757 42.6475 -96.9232
127931 US OK Newcastle 73065 35.2323 -97.6008
136726 CA ON Newcastle 43.9167 -78.5833
136915 US TX Newcastle 76372 33.245 -98.9103
137128 US WY Newcastle 82701 43.8396 -104.5681
137130 US WY Newcastle 82701 43.8396 -104.5681
Given that there are more than one Newcastle cities in the world and also that it returns all the different postcodes for Newcastle even if it contains the same lat/lon, how can we remove the duplicate entries?
I have taken a look at this Eliminate duplicate cities from database which comes up with this potential solution:
delete from climate.maxmind_city mc where id in (
select
max(c1.id)
from
climate.maxmind_city c1,
climate.maxmind_city c2
where
c1.id <> c2.id and
c1.country = c2.country and
c1.name = c2.name and
earth_distance(
ll_to_earth( c1.latitude_decimal, c1.longitude_decimal ),
ll_to_earth( c2.latitude_decimal, c2.longitude_decimal ) ) <= 35
group by
c1.country, c1.name
order by
c1.country, c1.name
)
However, earth_distance is a postgresql function and we are using MySQL. How can I therefore replace the earth_distance function with a similar MySQL approach?