1

What is the best way to fit test / test normality for each unique ilitm in the below dataset? Thanks

dbtbl

Rick
  • 13
  • 4

1 Answers1

1

As you know (visible in the edit history) Oracle provides the Shapiro-Wilk test of normality (I use a link to [R], as you will find much more reference for this implementation).

The important thing to know is that the OUT parameter sig corresponds to what the statistics call the p-value.

Example

DECLARE
   sig     NUMBER;
   mean    NUMBER := 0;
   stdev   NUMBER := 1;
BEGIN
   DBMS_STAT_FUNCS.normal_dist_fit (USER,
                                    'DIST',
                                    'DIST1',
                                    'SHAPIRO_WILKS',
                                    mean,
                                    stdev,
                                    sig);
   DBMS_OUTPUT.put_line (sig);
END;
/

you get the following output

W value : ,9997023261540432791888281834378157820514
,7136528702727722659486194469256296703232

For comparison the test in r with the same data

> shapiro.test(df$DIST1)

        Shapiro-Wilk normality test

data:  df$DIST1
W = 0.9997, p-value = 0.7137

The rest is statistics:)

My interpretation - this test is useful if you need to discard the most coarse deviations from the normal distribution

If sig < .05 you may throw the data away as not normal distributed, but a high value of sig doesn't mean the opposite. You only know that you can't discard it as non-normal..

Anyway a plot of distribution can provide better insight that a simple true/false test. Here is R a good resource as well.

Some other useful discussions to this topic.

Community
  • 1
  • 1
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53