4

I'm trying to figure out how to GROUP BY on multiple columns. I want to group items when the SSN or the address matches. For example, here are three records:

account_number | name         | ssn         | address
---------------+--------------+-------------+----------------------
23952352340    | SMITH INC    | 123-45-6789 | P.O. BOX 123
3459450340     | JOHN SMITH   | 123-45-6789 | 123 EVERGREEN TERRACE
45949459494    | JANE SMITH   | 395-23-1924 | 123 EVERGREEN TERRACE

And here's what I'd like to end up with:

names
----------------------
SMITH INC, JOHN SMITH, JANE SMITH

Any suggestions?

Jason Swett
  • 43,526
  • 67
  • 220
  • 351
  • 1
    Looks like you are grouping by Name. Can you give a better example? – Brig May 06 '11 at 20:37
  • Sure. I edited to include three distinct names. – Jason Swett May 06 '11 at 20:52
  • If you removed `3459450340` from the table, that would leave you with two groups instead of one, right? – Quassnoi May 06 '11 at 21:14
  • Yes. Jane's only relationship to SMITH INC is the fact that she shares an address with its owner. If that relationship goes away, the two customers belong in two separate groups. – Jason Swett May 06 '11 at 21:16
  • 1
    Also, if Quassnoi can't figure it out, I don't think anybody can, based on what I've seen in the past. :) – Jason Swett May 06 '11 at 21:17
  • What would happen if you added another row `9999999999 | JOE BLOGGS | 9876-54-321 | P.O. BOX 123` would that go into the same bucket? At the moment all of the ones in the group at least have a common relationship with `3459450340` – Martin Smith May 06 '11 at 22:08
  • @Martin: it would go into its own bucket since it's reachable by no records from the first bucket. – Quassnoi May 06 '11 at 22:55
  • @Quassnoi - It has the same address as `SMITH INC`. – Martin Smith May 06 '11 at 22:59
  • @Martin: Interesting question. The goal here is to determine which accounts/customers belong in the same household so we aren't sending two pieces of the same mail to the same person. It would be quite odd for Joe Bloggs to share a PO box with Smith, Inc., but I suppose if that were to happen, we would just include Joe Bloggs in that same household. It's pretty safe to say we never encounter that case in practice, but I think the rule to follow is to follow the chain of linking as far as it goes, and put all those people in one group. – Jason Swett May 07 '11 at 12:15

2 Answers2

1

You can't do this easily in MySQL.

The problem is that the relation "is similar to" as you define it is not transitive. In your example, Smith Inc is similar to John Smith (per SSN) and John Smith is similar to Jane Smith (per name), but Smith Inc is not similar to Jane Smith. So there is no single value that all records could be compared with and GROUP BY won't help here.

In other systems which support recursion you could build a transitive closure of this relation which would allow grouping, but this is not an easy task in MySQL.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
0

Like this:

SELECT
    name,
    ssn,
    COUNT(*)
FROM TheTable
GROUP BY
    name,
    ssn
Josh M.
  • 26,437
  • 24
  • 119
  • 200