How to induce "uniform" sparsity/sparse coding in machine learning model?

Question

I have a machine learning model (namely, an autoencoder) that attempts to learn a sparse representation of an input signal via a simple l1 penalty term added to the objective function. This indeed works to promote a sparse vector representation in the sense that most of the elements in the learned vector representation are zeros. However, I need the sparsity to be structured such that the non-zero elements are "spread out"/distributed/uniform over the vector. Concretely, for a given input signal, my model produces a sparse representation that looks like this:

Current sparse code: [...,0,0,0,0,0,0,0,0,0,0,0,0.2,0.3,0.5,0.9,0.3,0.2,0.1,0,0,0,0,0,0,0,0,0,0,0,0,...]

You can appreciate that most elements are zero, with small clusters of non-zero elements. Instead, I want the sparsity to be such that non-zero elements are "repelled" by each other, and hence make it so that all non-zero elements are surrounded by at least 1 or more zeros and few or no non-zero elements are adjacent in the vector; concretely, it should look more like this:

Desired sparse code: [...,0,0,0,0,0,0.2,0,0,0,0,0.9,0,0,0,0,0.5,0,0,0,0,0,0,0.7,0,0,0,0.4,0,0,0.6,...]

In the latter sparse code, the number of non-zero elements may be similar as the former, but each non-zero element is separated from each other by some number of zeros.

Is there a straightforward objective function penalty I can use to induce this form of sparsity?

score 0 · Accepted Answer · answered Apr 03 '18 at 06:16

Kullback Leibler divergence measures the distance between a target distribution and a sample distribution and is standard practice for latent space models / variational autoencoders. Many implementations for it exist across multiple libraries. It would allow you to enforce a uniform distribution on your output, which would (to a degree) lead to more evenly spaced values. Additional hard-coded constraints could also help you here; i.e. penalizing the mean square difference between sums of slices of your output vector would penalize unevenly spread values.

How to induce "uniform" sparsity/sparse coding in machine learning model?

1 Answers1