4

I searched up ways to implement dimensionality reduction in Python and this is the result that I got: http://scikit-learn.org/stable/modules/unsupervised_reduction.html. The last method shown in that website was feature agglomeration. I clicked on the link for the documentation for that python method, but I am still unsure how to use it.

If anyone has worked with Python's feature agglomeration method before, would it be possible for you to explain how it works (input, output, etc)? Thanks!

Cynthia
  • 377
  • 2
  • 4
  • 10

1 Answers1

6

You can use numpy array or a pandas dataframe as input to the sklearn.cluster.FeatureAgglomeration

Output is a numpy array, with rows equal to the rows in the dataset and columns equal to n_clusters parameter set in FeatureAgglomeration.

from sklearn.cluster import FeatureAgglomeration
import pandas as pd
import matplotlib.pyplot as plt

#iris.data from https://archive.ics.uci.edu/ml/machine-learning-databases/iris/
iris=pd.read_csv('iris.data',sep=',',header=None)
#store labels
label=iris[4]
iris=iris.drop([4],1)

#set n_clusters to 2, the output will be two columns of agglomerated features ( iris has 4 features)
agglo=FeatureAgglomeration(n_clusters=2).fit_transform(iris)

#plotting
color=[]
for i in label:
    if i=='Iris-setosa':
        color.append('g')
    if  i=='Iris-versicolor':
        color.append('b')
    if i=='Iris-virginica':
        color.append('r')
plt.scatter(agglo[:,0],agglo[:,1],c=color)
plt.show()

enter image description here

Vivek-Ananth
  • 494
  • 4
  • 4