What does mean affinity='precomputed' in Feature Agglomeration dimensionality reduction?

Question

What does affinity='precomputed' mean in feature agglomeration dimensionality reduction (scikit-learn) and how is it used? I got much better results than by using other affinity options (such as 'euclidean', 'l1', 'l2' or 'manhattan'), however, I'm not sure what this 'precomputed' actually means and whether I have to provide something "precomputed" to the feature agglomeration algorithm? What does "precomputed" actually means?

I haven't passed anything other than original data preprocessed (scaled), numpy array. After fit_transform with feature agglomeration, result was passed to Birch clustering algorithm and I got much better results than with other affinities mentioned. Results are comparable with PCA, but with much lower memory consumption overhead, so I would use feature agglomeration as dimensionality reduction, but I'm concerned that I did it wrong?

Yes, you can provide a precomputed distance matrix: https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/cluster/hierarchical.py#L769 — hellpanderr, Sep 27 '18 at 14:47
I haven't passed any "precomputed distance matrix", just scaled data (numpy array) and got pretty good results with clustering. What was used as "precomputed"? — zlatko, Sep 28 '18 at 07:51
https://stackoverflow.com/users/5025009/seralouk, so distance matrix is actual data itself? — zlatko, Sep 30 '18 at 08:01

seralouk · Accepted Answer · 2018-09-30T14:42:40.567

3

Nice question.

`affinity == 'precomputed'` means that the flatten array containing the upper triangular of the `distance matrix` of the original data is used.

Reference (source code):

    if affinity == 'precomputed':
        # for the linkage function of hierarchy to work on precomputed
        # data, provide as first argument an ndarray of the shape returned
        # by pdist: it is a flat array containing the upper triangular of
        # the distance matrix.
        i, j = np.triu_indices(X.shape[0], k=1)
        X = X[i, j]
    elif affinity == 'l2':
        # Translate to something understood by scipy
        affinity = 'euclidean'
    elif affinity in ('l1', 'manhattan'):
        affinity = 'cityblock'

edited Sep 30 '18 at 14:42

answered Sep 29 '18 at 22:53

seralouk

30,938
9
118
133

An the "distance matrix" is the data itself or what? – zlatko Sep 30 '18 at 08:02
1

it's the distance matrix of the original data. see here for details: https://en.wikipedia.org/wiki/Distance_matrix – seralouk Sep 30 '18 at 10:01

What does mean affinity='precomputed' in Feature Agglomeration dimensionality reduction?

1 Answers1

affinity == 'precomputed' means that the flatten array containing the upper triangular of the distance matrix of the original data is used.

`affinity == 'precomputed'` means that the flatten array containing the upper triangular of the `distance matrix` of the original data is used.