-2

I am working on creating the actual path of movement of Hurricane Sandy from twitter data. My approach is as follows:

I collect all the tweets related to hash-tag "Hurricane Sandy" between October 28, 2012 and October 31st, 2012(Hurricane Sandy made a landfall on October 29, 2012 at near Brigantine, New Jersey). It affected many neighboring states in next 2 days. I arrange all the collected tweets in time series and then divide the time sorted tweets into fixed sized time windows. Then, in each time window, I calculate the relevant tweets i.e. the tweets pointing to the position of hurricane track. Next, I take the location of the origin of relevant tweet and connect them to get the hurricane track.

The problem I am facing is how to determine the relevancy of any tweet to the track taken by the hurricane i.e. how to determine if a tweet is originated from an area that falls under the track of the hurricane. What possible features or algorithms are possible to do so?

1 Answers1

1

Have you had a look at the data?

Twitter data is 99% mess, and 1% signal.

I doubt you can achieve your goals from this data. In particular, the network may have been down where the real hurricane was...

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194