3

This may be some basic/fundamental question on 'dnorm' function in R. Let's say I create some z scores through z transformation and try to get the sum out of 'dnorm'.

 data=c(232323,4444,22,2220929,22323,13)
 z=(data-mean(data))/sd(data)
 result=dnorm(z,0,1)
 sum(result)
 [1] 1.879131

As above, the sum of 'dnorm' is not 1 nor 0.

Then let's say I use zero mean and one standard deviation even in my z transformation.

 data=c(232323,4444,22,2220929,22323,13)
 z=(data-0)/1
 result=dnorm(z,0,1)
 sum(result)
 [1] 7.998828e-38

I still do not get either 0 or 1 in sum.

If my purpose is to get sum of the probability equal to one as I will need for my further usage, what method do you recommend using 'dnorm' or even using other PDF functions?

Eric
  • 528
  • 1
  • 8
  • 26
  • 1
    The area below the density is 1 - that is the integral from -Inf to +Inf for the density function. You are doing some other thing by calculating the sum. – jogo Sep 30 '18 at 09:46
  • If I am trying to get the density 1 from dnorm to prove for my sake, what will be needed? – Eric Sep 30 '18 at 09:48
  • 1
    Read the answer of @AndersEllernBilgrau – jogo Sep 30 '18 at 09:49

1 Answers1

5

dnorm returns the values evaluated in the normal probability density function. It does not return probabilities. What is your reasoning that the sum of your transformed data evaluated in the density function should equate to one or zero? You're creating a random variable, there is no reason it should ever equate exactly zero or one.

Integrating dnorm yields a probability. Integrating dnorm over the entire support of the random variable yields a probability of one:

integrate(dnorm, -Inf, Inf)
#1 with absolute error < 9.4e-05 

In fact, integrate(dnorm, -Inf, x) conceptually equals pnorm(x) for all x.

Edit: In light of your comment.

The same applies to other continuous probability distributions (PDFs):

integrate(dexp, 0, Inf, rate = 57)
1 with absolute error < 1.3e-05

Note that the ... argument(s) from ?integrate is passed to the integrand.

Recall also that the Poisson distribution, say, is a discrete probability distribution and hence integrating it (in the conventional sense) makes no sense. A discrete probability distribution have a probability mass function (PMF) and not a PDF which actually return probabilities. In that case, it should sum to one.

Consider:

dpois(0.5, lambda = 2)
#[1] 0
#Warning message:
#In dpois(0.5, lambda = 2) : non-integer x = 0.500000

Summing from 0 to a 'very' large number (i.e. over the support of the Poisson distribution):

sum(dpois(0:1000000, lambda = 2)) 
#[1] 1
Anders Ellern Bilgrau
  • 9,928
  • 1
  • 30
  • 37
  • Thank you. If I am trying to do the same thing with other distributions such as poisson, exponential, binomial, etc. they seemed to require some additional variables such as lamda, size, and so on. If so, may I know how I can use integrate in this general situation? – Eric Sep 30 '18 at 12:10
  • 1
    @Eric I have edited the answer. You can simply add that as an argument. – Anders Ellern Bilgrau Sep 30 '18 at 12:31