Is spacial search in P2P network possible?

Question

I want to build a Javascript/HTML5 geolocation based social network and I wonder the best choice of possible architectures. Client-server can be simple to develop but drawback is the system ressources that could be very high, especially because the application must manage moves (worst case: a user that is in a car must see others users that are around him in cars).

Basicaly, in a client-server architecture, server tasks will be :

collects and stores latitude and longitude of the users (could have thousands of them)
makes geo distance search for that user (to get the list of users present around him in a radius)
builds and sends to the client an XML file with position of the users in the list

These 3 operation must be done periodically, every 3 or 5 seconds because I want a "live" map that shows users in the list moving in their environnement (city, town).

All these 3 points could be optimized :

client send his position when moving of 10 meters to reduce amount of data to process
"spherical rectangle" search in MyISAM table with spatial index (use of MBRContains) to off load MySQL database.
common output file : the XML that is sent can be the same if 2 users are located in a radius of x meters (the 2 users are close each-other).

It is hard to make load estimation at this stage but I think client-server architecture is not appropriate for that type of application and peer2peer could be a nice answer if 2 clients could communicate when they are near each other.

My point is:

Is there any methode to make possible a client to blind search other clients that are located in a certain radius without the help of a central server ? (it is possible with UDP broadcast :-)

edit : Correction. UDP Brodcast allow a client to poll a machine wherever it is, in certain range or IP address.

Thank you for your help, Florent

I googled "spacial search P2P" and found these two research (it looks to be very complex spacial algorithm and I'll try to operate with a basic client/server architecture). http://www.springerlink.com/content/9r6wl93g52bg6c75/ http://www.ijcaonline.org/archives/number15/315-483 — , Feb 19 '12 at 20:14

score -1 · Answer 1 · answered Apr 08 '12 at 00:45

The answer actually depends on many things so I'll help out with basic strategy. To understand things out you'll need to understand how does Kademlia works (Kademlia is a DHT P2P network that stores information).

In Kademlia at first startup each node picks random ID which is a 160 bit number that represents point in a space of all possible 160 bit IDs.

The ID of the information that needs to be stored is obtained with SHA-1 function (it receives arbitrary string, and outputs 160 bit number that is treated like ID of the information that needs to be stored)

After that you have the ID of the information, you publish it, the information is physically stored on a node that has it's ID close to information ID.

(The illustration is taken from here)

enter image description here

The information is queried via it's ID. Both the information lookups or node lookups takes O(log(N)) hops to obtain the required information. The "XOR" metric is used in Kademlia (in your case it can be ordinary Euclidian metric).

Each node maintains an array of buckets, each bucket contains addresses of nodes that are appropriate to the current bucket. The appropriate'ness is a measure of how close the IDs are. consider example:

           0                              160
Node 1 ID: 1101000101011111101110101001010...
Node 2 ID: 1101011101011111101110101001010...
Node 3 ID: 1101000101011001101110101001010...

After applying XOR metric to Nodes #1,2 i.e (computing the number that represents the virtual distance between these nodes) we get:

index - 012345678901234
xor   - 000001100000000... (the difference is in 5-th msb bit)
order - msb         lsb

After applying Xor metric to Nodes #1,3 we get:

index - 012345678901234
xor   - 000000000000011... (the difference is in 13-th msb bit)
order - msb         lsb

Apparently Node 1 is closer to Node 3 since it has difference in less significant bits than the distance from Node 1 to Node 2. And therefore from a point of view of a Node 1, it's neighbor Node 3 goes to 13-th bucket(higher index means closer IDs), and Node 2 goes to to 5-th bucket which contains a group of nodes that are 5 MSB radixes away from a current node ID.

Such data structure allows each node to know it's surroundings in variety of 160 levels of distances.

Back to your example, to allow efficient geospacial queries you'll need to replace Kademlias XOR metric with ordinary Euclidian metric. In this case you will have your ID's as a 3D or 2D vectors, and unfortunately due to fact that Euclidian metric results with floating point numbers which are not directly suitable for this type of algorithm so you will need to convert them to a discrete binary numbers somehow in a way similar to what XOR function does. After that, finding node's neighboring nodes is a trivial task.

Hope this helps. Oh by the way look to HyperDex, new searchable distributed datastore closely tied to euclidian metric, might help...

How do you connect geolocation with Kamdelia distances? These are completely different. Kamdelia distance is an abstract mathematical concept which has nothing to do with real geolocation distances. — Jérôme Verstrynge, Apr 08 '12 at 14:10
Distance between keys is a number, distance between geo locations is also a number. You should just take Kademlia concept, replace keys to geo locations, Xor metric to Euclidian metric, you will then be happy as an elephant. — Lu4, Apr 09 '12 at 10:47
Yoda, show me the algorithm to do that mapping between Kamdelia keys and geo locations. Do the actual implementation, and plug your solution to the issue in the original question. On your way, you'll learn something. So far, you've only been puffing smoke in the sky... It makes your head spin for sure !!! Get real !!! — Jérôme Verstrynge, Apr 09 '12 at 12:06
Dear Darth Vader "J" :) I agree to do all of the above in exchange to your implementation, it will be interesting to see and compare things out. — Lu4, Apr 09 '12 at 17:03
I don't believe your solution can be implemented to solve the issue raised in the original question. Prove me wrong... Show us what you've got !!! — Jérôme Verstrynge, Apr 09 '12 at 17:13

score -1 · Answer 2 · answered Mar 11 '12 at 00:28

-1

You will have to have central peers/servers, because you need to centralize some information to be able to perform you functionalities.

I would go for the following:

Assign square miles (or whatever size you want) to specific servers.
Have devices send a 'I am here' message with their coordinates to some dispatcher that will forward these to the correct square mile server for handling.
Have servers register when a device enters a square mile they manage. This could be a central map to make sure a device is registered to one and only one square.
Forward this message to all other devices in the square.
And/or make sure you include to which square this message is intended and make sure the devices checks it before displays it to the user.

Tune the size of the square and the rate of 'I am here' message. That's it.

answered Mar 11 '12 at 00:28

Jérôme Verstrynge

57,710
92
283
453

Sorry for the critic, but this is a typical quick fix answer of a practic developer, it states "why should I spend my time on thinking when I can spend my time on developing" and unfortunately this is what the industry pays for. By going this way you will end up with a solution tightly coupled with Oracle or some other universal technology that will help to reduce the complexity of the task in a way that is far from satisfiable. There is much more elegant solution to this problem allowing to find node's neighborhood in O(log(N)) hops – Lu4 Apr 07 '12 at 23:23
@Lu4 Is this a joke? How do you connect geolocation with Kamdelia distances? These are completely different. Kamdelia distance is an abstract mathematical concept with has nothing to do with real geolocation distances. You are not understanding the question at all. You are not connecting abstraction with reality. – Jérôme Verstrynge Apr 08 '12 at 14:09
@JVestry your reality is not real. A central server solution that you propose to use is a tight bottlenecks, it is a source of failures and root of all evil. Imagine that it will go down, whole P2P network will stop working. How scalability is addressed in your architecture, how much will it cost to scale? What about security, how secure should it be to prevent hackers to intrude? JVestry I don't meen to offend you but your solution is a pain in the ass, the same as the industry solutions which are made to take your money not to help you solve your problems. – Lu4 Apr 09 '12 at 10:40

Is spacial search in P2P network possible?

2 Answers2