-2

I have just started to learn data science.

This is the link to my first project: https://www.kaggle.com/code/madhavdass/divvy-bikes-chicago/notebook

Can someone help in visualizing if : tripdata_clean$ride_duration_min and tripdata_clean$ride_distance_km are normally distributed or not.

Also is there a non visual way to confirm if they are approximately simmilar to a normal ditribution.

Thanks in Advance

Adam Quek
  • 6,973
  • 1
  • 17
  • 23

1 Answers1

0

You should provide data using dput() not a web link. As noted, your web link does not work. We can use data that comes with R to see if the sepal length in irises is normally distributed. We are combining all three species in this example:

data(iris)
x <- iris$Sepal.Length

I'll use the qqPlot() function in the car package:

library(car)
qqPlot(x)
shapiro.test(x)
# 
#   Shapiro-Wilk normality test
# 
# data:  x
# W = 0.97609, p-value = 0.01018

The plot shows the the observed values fall outside the expected range for small lengths. The Shapiro-Wilk test indicates that the values are significantly different from a normal distribution. qqplot

What happens if we use only one species:

shapiro.test(iris$Sepal.Length[iris$Species=="virginica"])
# 
#   Shapiro-Wilk normality test
# 
# data:  iris$Sepal.Length[iris$Species == "virginica"]
# W = 0.97118, p-value = 0.2583

The lengths for species virginica are not significantly different from a normal distribution.

dcarlson
  • 10,936
  • 2
  • 15
  • 18