1

I am writing a code to perform kernel K-Means (aka https://en.wikipedia.org/wiki/K-means_clustering, but with a trick). I need to generate data, and as a first simple generator I tried to implement a Gaussian Mixture Model. Here are my code:

package p02kmeans

import breeze.linalg._
import breeze.stats.distributions._

/**
 * First data generation is simple, gaussian mixture model.
 */
object Data {
  class GaussianClassParam (
      val mean: Double,
      val sd: Double)

  /**
   * @param proportion marginal probability for each label
   * @param param param[j][k] returns the GaussianClassParam for the k class of the j variable
   * @param nObs number of observations to be generated
   * @result DenseMatrix_ij where i is the observation index and j is the variable number
   */
  def gaussianMixture(
      proportion: DenseVector[Double],
      param: Vector[Vector[GaussianClassParam]],
      nObs: Int)
  : DenseMatrix[Double] = {
    val nVar = param.size
    val multiSampler = Multinomial(proportion) // sampler for the latent class
    val varSamplerVec = param.map(v => v.map(c => Gaussian(c.mean, c.sd)))
    val zi = DenseVector.fill[Int](nObs)(multiSampler.sample)

    val data = DenseMatrix.tabulate[Double](nObs, nVar)((i, j) => varSamplerVec(j)(zi(i)).sample)

    return data
  }
}

When I try to compile my code (I use Scala-Ide and sbt eclipse on Windows 10) I get 2 errors:

  • Error in Scala compiler: assertion failed: List(method apply$mcI$sp, method apply$mcI$sp)
  • SBT builder crashed while compiling. The error message is 'assertion failed: List(method apply$mcI$sp, method apply$mcI$sp)'. Check Error Log for details.

The error is triggered by the line:

val data = DenseMatrix.tabulate[Double](nObs, nVar)((i, j) => varSamplerVec(j)(zi(i)).sample)

And disappear with:

val data = DenseMatrix.tabulate[Double](nObs, nVar)((i, j) => 12.0)

Could you help me debug this ?

My sbt configuration:

name := "Sernel"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies  ++= Seq(
  "org.scalanlp" %% "breeze" % "0.13.1",
  "org.scalanlp" %% "breeze-natives" % "0.13.1",
  "org.scalanlp" %% "breeze-viz" % "0.13.1"
)

I have the same errors on my OSX setup.

If you want to test the whole package (as, if you want to reproduce the error), the code is available on Github: https://github.com/vkubicki/sernel, and I am available to provide directions :).

vkubicki
  • 1,104
  • 1
  • 11
  • 26
  • 1
    it seems like a compiler bug (I suppose in scala macroses as Breeze is using those). You could try to perform total clean in the project (maybe even including `.ivy2` folder - this could be a difference between your MacOS and Windows setup) and also update your scala to 2.11.11 (or maybe even 2.12.x) – dk14 Aug 05 '17 at 14:37
  • It was, thank you ! I had to edit the update site in Eclipse, because the Scala IDE proposed here is limited to Scala 2.11.8. Now using the latest version shiping Scala 2.12.2, the error disappeared :). – vkubicki Aug 05 '17 at 15:59
  • If you want to propose this comment as an answer I will select it and that will close the question :). – vkubicki Aug 05 '17 at 16:00
  • Oh wait, it is very strange. I was able to compile it one time, but now the errors are back after a slight modification in the code. Actually it crashes when I change a numerical constant in another file. – vkubicki Aug 05 '17 at 16:02
  • However, cleaning the project each time the problem occurs "solves" the issue. So, the other part of your answer works :). – vkubicki Aug 05 '17 at 16:14
  • published as an answer with some updates – dk14 Aug 05 '17 at 20:25

1 Answers1

0

It seems like it's a compiler bug (I suppose in scala macroses as Breeze is using those). You could try to perform total clean in the project (maybe even including .ivy2 folder - this could be a difference between your MacOS and Windows setup) and also update your scala to 2.11.11 (or maybe even 2.12.x)

However, similar issue with Scala 2.11.6 (and something tells me it's inherited in subsequent versions of Scala) wasn't fixed: https://issues.scala-lang.org/browse/SI-9284

So probably, you'll have to repeatedly perform cleaning sometimes or maybe try some other NumPy analogs like: scalala, Nd4j/Ndjs.

It could also help to try another IDE (IDEA/Atom) or try to use "bare" SBT as Eclipse is probably interfering by calling Scala's compiler front-end.

dk14
  • 22,206
  • 4
  • 51
  • 88