Visual inspection of the distribution of values in the dataset is the quickest way of establishing whether the data is normally distributed:
df <- data.frame(columnA = c(0.333, 0.235, -0.111),
columnB = c(0.231, -0.114, -0.234),
columnC = c(-0.123, -0.034, 0.442))
# Convert dataframe to vector (with loss of data structure information)
vec <- as.vector(t(df))
vec
# [1] 0.333 0.231 -0.123 0.235 -0.114 -0.034 -0.111 -0.234 0.442
hist(vec)
An alternative to the above using data.table to transform the data before plotting in order to retain data structure information:
library(data.table)
df <- data.frame(columnA = c(0.333, 0.235, -0.111),
columnB = c(0.231, -0.114, -0.234),
columnC = c(-0.123, -0.034, 0.442))
# Convert to data.table
dt <- as.data.table(df)
# Pivot long (columns to rows)
dt <- melt(dt)
# variable value
# 1: columnA 0.333
# 2: columnA 0.235
# 3: columnA -0.111
# 4: columnB 0.231
# 5: columnB -0.114
# 6: columnB -0.234
# 7: columnC -0.123
# 8: columnC -0.034
# 9: columnC 0.442
hist(dt$value)
Alternatively you can use statistical description to contribute to your interpretation of whether the data is normally distributed or not, e.g. when data is normally distributed we expect the mean, median and mode to be approximately the same:
# Values are continuous so it is necessary to bin the data to calculate the mode
# The hist function does this for us
plt <- hist(dt$value)
# The mode is a bin range
mode <- paste0("(",
plt$breaks[which(plt$counts==max(plt$counts), arr.ind=T)],
", ",
plt$breaks[(which(plt$counts==max(plt$counts), arr.ind=T) + 1)],
"]")
# Summarise mean and median and add mode to data displayed
dt[, .(mean=mean(value), median=median(value))
][, lapply(.SD, round, 3)
][, .(mean, median, mode)]
# mean median mode
# 1: 0.069 -0.034 (-0.2, -0.1]
You need to interpret the numbers (and their approximate difference or lack thereof) to determine whether the data is normally distributed.
In theory you could try to use a chi-square goodness of fit test to compare your empirical data with a simulation of data generated randomly from the random distribution using the parameters of your sample but you would need to think through several questions (e.g. how many breaks do I need to bin my data into, how many records are a good number of records to ensure the bins are not empty due to low sample size but not so high that it would make the chi square over sensitive etc.).
There are other measures (e.g. skew, kurtosis, overdispersion) that you could also consider.