I am working on creating the actual path of movement of Hurricane Sandy from twitter data. My approach is as follows:
I collect all the tweets related to hash-tag "Hurricane Sandy" between October 28, 2012 and October 31st, 2012(Hurricane Sandy made a landfall on October 29, 2012 at near Brigantine, New Jersey). It affected many neighboring states in next 2 days. I arrange all the collected tweets in time series and then divide the time sorted tweets into fixed sized time windows. Then, in each time window, I calculate the relevant tweets i.e. the tweets pointing to the position of hurricane track. Next, I take the location of the origin of relevant tweet and connect them to get the hurricane track.
The problem I am facing is how to determine the relevancy of any tweet to the track taken by the hurricane i.e. how to determine if a tweet is originated from an area that falls under the track of the hurricane. What possible features or algorithms are possible to do so?