Normallity Tests and Data Science

Question

I have a dataset and I wanted to check if some variables follow a normal distribution. So, I have read about normallity test but it seem to be sensitive when the amout of data is large and, apparantly when I plot the histogram and the QQ-Plot, the variable seem to be normal. Should I only relly on the Histogram plus QQ-Plot or it is the best practice to do a normallity test as well?

I have plotted histograms, QQ-Plot, Kurtosis, Skewness and also a dataframe with 3 normallity tests

1 - Shapiro-Wilk. 2 - Lilliefors. 3 - D'Agostino_K2

My conclusion was that the variable is normal based on QQ-Plot, Histogrm and, Kurtosis and Skewness between (-1, +1).

score 0 · Answer 1 · answered Jun 09 '23 at 11:01

When assessing whether a variable follows a normal distribution, it is recommended to use a combination of methods to make a more informed decision. Relying solely on the histogram and QQ-plot may not provide a comprehensive analysis of the data's normality. While visual inspections can provide useful insights, they are not definitive tests for normality.

Using additional normality tests, such as the Shapiro-Wilk, Lilliefors, or D'Agostino_K2 tests, can offer statistical evidence to support or refute the assumption of normality.

These tests are specifically designed to evaluate whether a dataset significantly deviates from a normal distribution.

A kurtosis value within the range of -1 to +1 and skewness between -1 and +1 suggest a relatively symmetric and normally distributed dataset.

it is recommended to utilize a combination of methods including visual inspections (histogram and QQ-plot) and statistical tests (Shapiro-Wilk, Lilliefors, D'Agostino_K2) to assess the normality of your variables.

Ok, but I have also read that all of these tests can be very sensitive when the dataset is large. So, should I really on them even though my dataset is very large? — Caio Sóter, Jun 09 '23 at 11:14
This answer looks like it was generated by an AI (like ChatGPT), not by an actual human being. You should be aware that [posting AI-generated output is officially **BANNED** on Stack Overflow](https://meta.stackoverflow.com/q/421831). If this answer was indeed generated by an AI, then I strongly suggest you delete it before you get yourself into even bigger trouble: **WE TAKE PLAGIARISM SERIOUSLY HERE.** Please read: [Why posting GPT and ChatGPT generated answers is not currently acceptable](https://stackoverflow.com/help/gpt-policy). — tchrist, Jul 07 '23 at 02:03

Normallity Tests and Data Science

1 Answers1