1

I am trying to replicate the t-test with mathematical formulas and it somehow wont work with may current attempt below:

# PDF t-distribution:
t_distr = function(x,df){
  t_1 = gamma((df+1)/2)/(sqrt(df*pi)*gamma(df/2))
  t_2 = (1 + (seq^2/df))^(-(df+1)/2)
  t_distr = t_1*t_2
  return(t_distr)
} # End of function

# Initialize Sequence: 
seq = seq(-4,4,by = .1)

# Integrating up to t-value of 1 (lower tail t-test):
integrate(t_distr, df = length(seq)-1, lower = -Inf, upper = 1 )

# For df = length(seq)-2 (e.g. for a lin. reg):
integrate(t_distr, df = length(seq)-2, lower = -Inf, upper = 1 )

The above code will result in the following error message:

Error in integrate(t_distr, df = length(seq) - 1, lower = -Inf, upper = 1) : 
  evaluation of function gave a result of wrong length

Any suggestions how to make it possible to integrate the pdf of a t-distribution using the integrate function? It also does not work with a "df" of just n...

mkl
  • 90,588
  • 15
  • 125
  • 265
  • In your t function, what is `seq`? – Onyambu Mar 27 '23 at 06:48
  • Use t.test function. This has nothing to do with the t distribution unless you want the pvalue – Onyambu Mar 27 '23 at 06:53
  • You have `dt` for density, `pt` for probability and `qt` for quantile. You could easily use hose. Or do you want to implement your own? – Onyambu Mar 27 '23 at 07:03
  • I know that these functions exist, but I want to recreate it via using the actual formulas. And what I want is the p-value that is why I am trying to integrate.... I did the same already for the z-test.... Now I want to do it via the PDF of a t-distribution as well. Apart from that: a t-test does not result in a t-distribution but a t-value on a PDF of a t-distribution. However, thanks anyway.... – Steffen Schwerdtfeger Mar 27 '23 at 07:29

1 Answers1

1

A couple of issues:

  1. Your t_distr function uses seq which is not defined (actually it is defined to be function, later you overwrite it with a vector, avoid that). You want your t_distr (the density of the t distribution) to take one argument x:

    t_distr <- function(x, df){
      t_1 <- gamma((df + 1) / 2) / (sqrt(df * pi) * gamma(df / 2))
      t_2 <- (1 + (x ^ 2 / df)) ^ (-(df + 1) / 2)
      t_1 * t_2
    }
    

    (you can check that your t density is correctly defined by all.equal(t_distr(2.123, 12), dt(2.123, 12)).

  2. For integrate (to move from the desnity to the cdf) you need to provide your function and limits. The function will then be integrated in its first argument:

    integrate(t_distr, df = 12, lower = -Inf, upper = 1)
    

    this will integrate your density from -Inf to 1 with 12 degrees of freedom. Again we can check whether we got the correct results by comparing that to the built-in cdf of the t-distribution:

    all.equal(integrate(t_distr, df = 12, lower = -Inf, upper = 1)$value, pt(1, 12))
    
  3. The degrees of freedom have actually nothing to do with the integration itself and should be seen as an additional parameter. Maybe that's were the confusion came from.

thothal
  • 16,690
  • 3
  • 36
  • 71
  • Thanks so much! "seq" within the function was the problem indeed!! In a former version I used seq instead of x. – Steffen Schwerdtfeger Mar 27 '23 at 07:47
  • `R` lexical scoping is a bless and a curse. As `seq` is not defined in the body of `t_distr` it will be looked up in the global environment and all the way up to the base namespace (where the function is defined). However, as you overwrote `seq` by a vector this error is even more difficult to spot, because now we use at least some numerics instead of a function. Bototm line: avoid at all costs to re-use pre-defined names (`c`, `data` are classics). – thothal Mar 27 '23 at 07:52