0

I am working on finding statistical outliers in weather-related data. More specifically, I have the temperature and the location(longitude and latitude)of 10000 data points where the temperature was recorded at a specific time. What would be the best method to locate geographical-weather-related outliers and visualize the data in a way where the outliers become dominantly visible. For the visualization part, a python tool would be most appreciated and the locating the outliers part an algorithm or technique would be most useful. (I am thinking of cluster)

Adam
  • 1
  • 1
  • 1
    The 10,000 data points are distributed over how many locations? How far are these locations appart from each other? – Tarik Jun 18 '21 at 04:48
  • Hey, @Tarik The locations are mostly located in America but are technically spread across the world. They have no distinct distance apart from each other – Adam Jun 18 '21 at 04:52

1 Answers1

0

It really depends how you would use it. You do mention outliers, thus DBSCAN could be used (it essentially creates clusters, and points not in a cluster is considered an outlier).

If all you care about is which points are outliers, and not which points are clustered, you can use e.g Isolation Forrest

CutePoison
  • 4,679
  • 5
  • 28
  • 63
  • Thanks for the help. How would you recommend showcasing the outliers in a way that the normal person if looking at the graph can be like the outlier is right there on the map. I tried a simple scatterplot that has colors that translate directly to the hotness and coldness of the data point using python data visualization tools such as Mapbox – Adam Jun 18 '21 at 05:56
  • It depends what you want to show with the plots. If you just want the outlies to be shown, then just set that temperature to 100 and all "correct" points to 0 - then the heatmap shows them. But if you want to show the correct temperature for all points, then you'll need to frame the outliers in some way (maybe making them black?) – CutePoison Jun 18 '21 at 07:40