2

What does affinity='precomputed' mean in feature agglomeration dimensionality reduction (scikit-learn) and how is it used? I got much better results than by using other affinity options (such as 'euclidean', 'l1', 'l2' or 'manhattan'), however, I'm not sure what this 'precomputed' actually means and whether I have to provide something "precomputed" to the feature agglomeration algorithm? What does "precomputed" actually means?

I haven't passed anything other than original data preprocessed (scaled), numpy array. After fit_transform with feature agglomeration, result was passed to Birch clustering algorithm and I got much better results than with other affinities mentioned. Results are comparable with PCA, but with much lower memory consumption overhead, so I would use feature agglomeration as dimensionality reduction, but I'm concerned that I did it wrong?

zlatko
  • 596
  • 1
  • 6
  • 23
  • 1
    Yes, you can provide a precomputed distance matrix: https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/cluster/hierarchical.py#L769 – hellpanderr Sep 27 '18 at 14:47
  • I haven't passed any "precomputed distance matrix", just scaled data (numpy array) and got pretty good results with clustering. What was used as "precomputed"? – zlatko Sep 28 '18 at 07:51
  • 1
    let me know if my answer helps – seralouk Sep 29 '18 at 23:00
  • https://stackoverflow.com/users/5025009/seralouk, so distance matrix is actual data itself? – zlatko Sep 30 '18 at 08:01

1 Answers1

3

Nice question.

affinity == 'precomputed' means that the flatten array containing the upper triangular of the distance matrix of the original data is used.

Reference (source code):

    if affinity == 'precomputed':
        # for the linkage function of hierarchy to work on precomputed
        # data, provide as first argument an ndarray of the shape returned
        # by pdist: it is a flat array containing the upper triangular of
        # the distance matrix.
        i, j = np.triu_indices(X.shape[0], k=1)
        X = X[i, j]
    elif affinity == 'l2':
        # Translate to something understood by scipy
        affinity = 'euclidean'
    elif affinity in ('l1', 'manhattan'):
        affinity = 'cityblock'
seralouk
  • 30,938
  • 9
  • 118
  • 133