I have a large dataset with original and imputed values and the proportional differences between the two. The quantiles of the proportional differences are:
> quantile(p$prdif, probs=c(0, 0.1, 0.2, 0.3, .4, .5,0.6, 0.7, 0.8, 0.9, 1))
0% 10% 20% 30% 40% 50% 60% 70% 80%
-0.99269227 -0.43367924 -0.22983182 -0.07498240 0.06285345 0.20829226 0.39253900 0.65837197 1.18619469
90% 100%
11.25010211 Inf
I plot the histogram of the proportional differences with the following command:
# Calculate means
mu <- p %>% filter(orig != 0) %>%
summarise(mu1 = mean(orig), mu2 = mean(imp), mu3 = mean(dif), mu4 = mean(prdif) )
ggplot(p %>% filter(orig != 0), aes(x= prdif) )+
geom_histogram(aes(y=..density..), position="identity", alpha=0.4, fill = 'blue')+
geom_density(alpha=0.6, size = 2)+
geom_vline(data=mu, aes(xintercept=mu4, color= "red" ),
linetype="dashed", size = 1.5)+
labs(title="Differences between imputed and original values",x="Proportional Difference", y = "Density")
Which results in the following plot:
In order to focus in the area where the majority of the values are --i.e. -1, +2-- I use the coord_cartesian function in the following manner:
ggplot(p %>% filter(orig != 0), aes(x= prdif) )+
geom_histogram(aes(y=..density..), position="identity", alpha=0.4, fill = 'blue')+
geom_density(alpha=0.6, size = 2)+
geom_vline(data=mu, aes(xintercept=mu4, color= "red" ),
linetype="dashed", size = 1.5)+
labs(title="Differences between imputed and original values",x="Proportional Difference", y = "Density") +
coord_cartesian(xlim = c(-1, 2))
Which results to this plot:
I can not understand why the plot is empty. There are obviously values in the range -1, +2.
Edit:
Following the comments below, I changed the code filtering values above 2 and increasing the number of bins to 300. The code and output are the following:
ggplot(p %>% filter(orig != 0 & prdif < 2), aes(x= prdif) )+
geom_histogram(aes(y=..density..), position="identity", alpha=0.4, fill = 'blue', bins = 300)+
geom_density(alpha=0.6, size = 2, color = "yellow")+
geom_vline(data=mu, aes(xintercept=mu4), color= "red" ,
linetype="dashed", size = 1.5)+
labs(title="Differences between imputed and original values",x="Proportional Difference", y = "Density") +
coord_cartesian(xlim = c(-1, 2))
Again the output looks weird to me. I would expect to see something like this:
Your advice will be appreciated.