I'm creating a simulator for a large scale P2P-system. In order to make the simulations as good as possible I would like to use data from the real world. I'd like to use this data to simulate each node's behavior (primarily it's availability). Is there any availability-data that has been recorded from large P2P-systems (such as BitTorrent) available?
3 Answers
I'm not too sure about other P2P protocols, but here's a stab at answering the question for BitTorrent:
You should be able to glean some stats from a BitTorrent tracker log, in the case where the tracker was centralised (as opposed to decentralised tracker, or where a decentralised hash table is used).
To wrap your head around the logs, have a look at one of the many log analyzers, like BitTorrent Tracker Log Analyzer.
As for actual data, you can find them all over the web. There's a giant RedHat9 tracker log here ☆, for instance. I'd search Google for "bittorrent tracker log".
☆ The article Dissecting BitTorrent: Five Months in a Torrent's Lifetime on that page also looks interesting.

- 5,388
- 1
- 37
- 64
Another way of appropaching this is to simulate availability mathematically. Availability will follow some powerlaw distribution, e.g. the vast majority of nodes are available very rarely and for short periods of time, and a very few nodes are available nearly always over long periods.
Real world networks will of course have many other types of patterns in the data so this is not a perfect simulation, but I figure it's pretty good.

- 8,177
- 4
- 56
- 105
I've found two web-sites that have what I was looking for. http://p2pta.ewi.tudelft.nl/pmwiki/?n=Main.Home and http://www.cs.uiuc.edu/homes/pbg/availability/

- 3,401
- 6
- 39
- 75