2

What is the best method for finding all the modes in a continuous variable? I'm trying to develop a java or python algorithm for doing this.

I was thinking about using kernel density estimation, for estimating the probability density function of the variable. After, the idea was to identify the peaks in the probability density function. But I don't now if this makes sense and how to implement this in a concrete code in Java or Python.

Paulo
  • 73
  • 4

1 Answers1

1

Any answer to the question "how many modes" must involve some prior information about what you consider a likely answer, and any result must be of the form "p(number of modes = k | data) = nnn". Given such a result, you can figure out how to use it; there are at least three possibilities: pick the one with greatest probability, pick the one that minimizes some cost function, or average any other results over these probabilities.

With that prologue, I'll recommend a mixture density model, with varying numbers of components. E.g. mixture with 1 component, mixture with 2 components, 3, 4, 5, etc. Note that with k components, the maximum possible number of modes is k, although, depending on the locations and scales of the components, there might be fewer modes.

There are probably many libraries which can find parameters for a mixture density with a fixed number of components. My guess is that you will need to bolt on the stuff to work with the posterior probability of the number of components. Without looking, I don't know a formula for the posterior probability of the number of modes, although it is probably straightforward to work it out.

I wrote some Java code for mixture distributions; see: http://riso.sourceforge.net and look for the source code. No doubt there are many others.

Follow-up questions are best directed to stats.stackexchange.com.

Robert Dodier
  • 16,905
  • 2
  • 31
  • 48
  • My concern with this approach would be that mixture distributions are not necessarily multimodal. Try looking at an equal-weight mixture of N(0,1) and N(2,1) - these distributions barely overlap and yet the mixture is unimodal. – Aniko Jul 11 '18 at 17:16
  • @Aniko Yes, that's right -- the number of components is the maximum number of components, but the actual number might be less. – Robert Dodier Jul 11 '18 at 17:31