0

I have a mysql clients table which stores a bunch of info on the client, for the sake of this question let's say:

firstName
lastName
houseName
streetName
localityName

The system has an address labels generator to send promotional material to my clients' home (Yes, they request this not junk mail :) ) and I would like to avoid sending multiples to any one address. Basically if a husband and a wife are both in the system and having a very similar address, I don't want to print both.

Now obviously since the system relies on the operator, a street name can be inputted as "John Doe Str." or "John Doe Street" or "John Doe str" or the house name can be inputted in a different manner such as "531, House Name" or "House Name 531" etc.

What I'm after is a an idea (If it's even possible) to build either a query, or some php code that would try to figure out as accurately as possible if it's the same address (taking into account all address fields) and possibly use the surname as a possible reference as it might indicate that two people are living in the same household that share a surname and a very similar address.

What do you guys think? I might be way over my head with this one.

M1ke
  • 6,166
  • 4
  • 32
  • 50
Necrospasm
  • 11
  • 8
  • Your question is more about the logic of address syntax than the actual programming required to do it. I presume you're familiar with the `LIKE` operator in SQL? E.g. `road LIKE 'John Doe Str%` would give `John Doe Str` and `John Doe Street`? – M1ke Jan 06 '16 at 12:10
  • Are you asking for sample code or general thoughts about your design and approach? – hazardous Jan 06 '16 at 12:11
  • It's not worth it. What you want is really hard to achieve. And if you think about it, how would you (human) handle two people with the same surname living in the same house? If it's a male and a female you might think of a couple but what if they are two males? Just neighbors? Father and son? – maxhb Jan 06 '16 at 12:11
  • Both. Even about the feasibility of actually doing it. You know like if on 20,000 clients this would be slow as hell and how reliable the approach is. I'd rather send 2 copies to the same address than none. I'm familiar with the LIKE operator but that would be only good for Str and Street. Str and Strt would fail. – Necrospasm Jan 06 '16 at 12:14
  • @maxhb That's why I wanted to compare address. Basically my thought of comparing the surnames and their addresses, if the percentage that this is the same household is 95%, display only 1 of them, if it's less, display both. At least that was my idea. With regards to it being hard to achieve, yes I'm starting to think so too. – Necrospasm Jan 06 '16 at 12:16
  • if they request it why not just use the info they supply, there is no need to find records in your database to prevent duplicates – RST Jan 06 '16 at 12:20
  • What if both people of a couple really want your promotional material? Would you really want to decide they only get it once? – maxhb Jan 06 '16 at 12:22
  • @RST That is true, however many times there are either modifications of the address, the husband doesn't know the wife already made a request etc. Was just thinking about an automated system to minimize duplicates as much as possible whilst maintaining a certain level of "scepticism". If the system is 95% sure that it's the same address, I can live with that 5% margin of error and send 1 copy, if it's less, then send both. – Necrospasm Jan 06 '16 at 12:24
  • @maxhb That is also true however in 13 years, I never received any "complaints" for sending only 1 copy (as at the moment I'm doing this manually. not exactly fool proof). Just like any logic, somewhere you have to draw a line. I'd rather get one or two complaints and send out the material manually then go through 20,000 manually to check if it's the same household. – Necrospasm Jan 06 '16 at 12:27
  • Anyway, as @maxhb said, this might be really hard to achieve and even if it achieved, it would be extremely slow probably as it would have to compare 20,000 records one by one against each other. So yeah, unless any of you guys has some insight (link or otherwise), I'm just gonna give up on this haha – Necrospasm Jan 06 '16 at 12:30
  • The easiest way is to standardize the address with an API (SmartyStreets, Google Street, Melissa Data, etc) in order to compare them. That way you will be comparing equality instead of similarity. – camiblanch Jan 07 '16 at 20:46

0 Answers0