I have a need to cluster a data set of lat,long coordinates. I am using python as my language and plan on using DBSCAN as I don't want to have to specify the # of clusters.
The goal and purpose is to be able to input a large data set of lat,long coordinates, which have many features attached, and assign cluster groups that will be returned. The original database which contains entries in the form of [lat long feature1, feature2 ....] needs to be amended with a new field called, "cluster group": [lat long clustergroup feature1, feature2 ....]. This will help me identify which data points are grouped closely together, without having to plot on a map. I am hoping that outliers will be given separate group IDs and points which are largely clustered together will be given the same group ID.
My input to DBSCAN would be x,y coordinates, after I convert the lat,long -->x,y & neglect the z coordinate. I am using:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN http://scikit-learn.org/stable/auto_examples/index.html
I am having difficulty understanding how to setup the input for this function. Am I able to input x,y coordinates? Would this be a list of tuples? If someone could help me visualize this, it would be a great help.
Also, can you explain how DBSCAN would be different from hierarchical clustering?