I'm working on a project with 500,000 participants. We have in our database the precise coordinates of their home, and we want to release this data to someone who needs it to evaluate how close our participants live to one another.
We are very reluctant to release the precise coordinates, because this is an anonymized project and the risk for re-identification would be very high. Rounded coordinates (to something like 100m or 1km) are apparently not precise enough for what they're trying to achieve.
A nice workaround would have been to send them a 500,000 by 500,000 matrix with the absolute distance between each pair of participants, but this means 250 billion entries, or rather 125 billion if we remove half the matrix since |A-B| = |B-A|.
I've never worked with this type of data before, so I was wondering if anyone had a clever idea on how to deal with this? (Something that would not involve sending them 2 TB of data!)
Thanks.