0

I'm trying to use the Anderson-Darling test on some data I am pretty sure is normally distributed. I have tried many tests but am running into trouble because lots of the values are zero and many of the non-zero ones are small. When I run the code on my array, I am getting the error: "RuntimeWarning: divide by zero encountered in log S = sum((2 * i-1.0)/ N * (log(z)+log(1-z[::-1])),axis=0)" and the test statistic is showing up as inf. Got any tips?

Edit: here is some code with some sample data

data = np.array([[  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   1.00000000e+00
   0.00000000e+00   9.90000010e-01   0.00000000e+00   9.90000010e-01
   1.00000000e+00   9.30000007e-01   1.00000000e+00   1.00000000e+00
   2.00000000e+00   1.99000001e+00   9.30000007e-01   9.90000010e-01
   0.00000000e+00   1.00000000e+00   0.00000000e+00   1.93000001e+00
   1.00000000e+00   1.99000001e+00   2.94000000e+00   1.00000000e+00
   5.83000004e+00   5.83000004e+00   6.91000003e+00   6.82999998e+00
   1.07800000e+01   9.77000004e+00   1.34700000e+01   1.77100000e+01
   2.76000001e+01   5.09000002e+01   9.92300003e+01   1.99720001e+02
   3.94330001e+02   8.10660002e+02   1.60540001e+03   2.83691001e+03
   4.46896001e+03   5.48025002e+03   5.21601002e+03   3.94428001e+03
   2.48591001e+03   1.56996000e+03   9.49610003e+02   5.92650002e+02
   4.02490001e+02   2.93620001e+02   2.16200001e+02   1.43060001e+02
   1.22550000e+02   9.20400003e+01   9.50000004e+01   6.97500002e+01
   6.26600003e+01   5.37800002e+01   5.31300001e+01   5.01200002e+01
   4.25000002e+01   3.14000001e+01   2.94300001e+01   3.41700001e+01
   3.16100001e+01   2.83400001e+01   1.86500001e+01   1.76200001e+01
   2.06500001e+01   1.38100001e+01   1.37600001e+01   1.26400000e+01
   1.17600001e+01   5.85000002e+00   1.29200000e+01   1.09100000e+01
   5.97000003e+00   3.99000001e+00   4.92000002e+00   8.84000003e+00
   5.80000001e+00   3.91000003e+00   5.96000004e+00   2.88000000e+00])
print(stats.anderson(data))

as is, this gives a test statistic in the 20s which is ridiculous, but when I use my actual data which has lots of small values like this and then about a 1000 zeros after it, the test statistic is inf and I get the error I pasted above. I can't work out what I'm doing wrong for the test to fail. Any advice would be appreciated.

VerityR
  • 1
  • 1
  • Please make a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example), and make sure it's minimal, so that we can easily try and reproduce the problem. You can [edit] the question. – AlexK Jul 17 '22 at 21:06
  • Okay thanks, I just added some code to illustrate my problem. – VerityR Jul 17 '22 at 22:43
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 18 '22 at 00:20
  • Well, if you have all these zeros in your data then the data is not Gaussian and the test statistic should be large. – thomaskeefe Jul 21 '22 at 19:42

0 Answers0