15

I am looking for twitter or other social networking sites dataset for my project. I currently have the CAW 2.0 twitter dataset but it only contains tweets of users. I want a data that shows the number of friends, follower and such.

It does not have to be twitter but I would prefer twitter or facebook. I already tried infochimps but apparently the file is not downloadable anymore for twitter.

Can someone give me good websites for finding this kind of dataset. I am going to feed the dataset to hadoop.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
denniss
  • 17,229
  • 26
  • 92
  • 141

4 Answers4

7

Try the following three datasets:

Contains around 97 milllion tweets:

http://demeter.inf.ed.ac.uk/index.php?option=com_content&view=article&id=2:test-post-for-twitter&catid=1:twitter&Itemid=2

ed note: the dataset previously linked above is no longer available because of a request from Twitter to remove it.

Contains user graph of 47 million users:

http://an.kaist.ac.kr/traces/WWW2010.html

Following dataset contains network as well as tweets, however the data was collected by snowball sampling or something hence the friends network is not uniform. It has around 10 million tweets you can mail the researcher for even more data.

http://www.public.asu.edu/~mdechoud/datasets.html

Though have a look at the license the data is distributed under.

Hope this helps, Also can you tell me what kind of work are planning with this dataset? I have few hadoop / pig scripts to use with dataset

Mark Elliot
  • 75,278
  • 22
  • 140
  • 160
  • 1
    @Akshay Bhat: They seem to have removed the datasets as of today. Would you happen to know any other datasets that might be available? Thank You! – Legend Jul 18 '11 at 18:51
5

100 million pages were extracted from facebook : http://it.slashdot.org/story/10/07/28/1350222/100-Million-Facebook-Pages-Leaked-On-Torrent-Site?art_pos=6

I don't know what they contain, but you could have a look, it seems it's easy to find on torrents sites.

You could also use the facebook API, but if you want a dataset big enough, you would have to ask facebook the rights to access it. It contains links to friends, likes, groups, ...

Scharron
  • 17,233
  • 6
  • 44
  • 63
2

Facebook social graph, application installations and Last.fm users, events, groups collected by researchers at UCIrvine: http://odysseas.calit2.uci.edu/research/

pbx
  • 21
  • 1
1

I think the best tool for twitter data gathering is http://www.followthehashtag.com , it can get historical or future data and with advanced data exporting features

With a section where we add big datasets (about 200,000 tweets) once a week

http://followthehashtag.com/datasets/