I'm a data scientist and still relatively new to Scala. I'm trying to understand the Scala documentation and run a t-test from any existing package. I am looking for sample Scala code on a dummy data set that will work and insight into understanding how to understand the documentation.
I'm working in an EMR Notebook (basically Jupyter notebook) in an AWS EMR cluster environment. I tried referring to this documentation but apparently I am not able to understand it: https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/inference/TTest.html#TTest()
Here's what I've tried, using multiple load statements for two different packages that have t-test functions. I have multiple lines for the math3.state.inference
package since I'm not entirely certain the differences between each and wanted to make sure this part wasn't the problem.
import org.apache.commons.math3.stat.inference
import org.apache.commons.math3.stat.inference._ // note sure if this means, import all classes/methods/functions
import org.apache.commons.math3.stat.inference.TTest._
import org.apache.commons.math3.stat.inference.TTest
import org.apache.spark.mllib.stat.test
No errors there.
import org.apache.asdf
Returns an error, as expected.
The documentation for math3.state.inference
says there is a TTest()
constructor and then shows a bunch of methods. How does this tell me how to use these functions/methods/classes? I see the following "method" does what I'm looking for:
t(double m, double mu, double v, double n)
Computes t test statistic for 1-sample t-test.
but I don't know how to use it. Here's just several things I've tried:
inference.t
inference.StudentTTest
test.student
test.TTest
TTest.t
etc.
But I get errors like the following:
An error was encountered:
<console>:42: error: object t is not a member of package org.apache.spark.mllib.stat.test
test.t
An error was encountered:
<console>:42: error: object TTest is not a member of package org.apache.spark.mllib.stat.test
test.TTest
...etc.
So how do I fix these issues/calculate a simple, one-sample t-statistic in Scala with a Spark kernel? Any instructions/guidance on how to understand the documentation will be helpful for the long-term as well.