3

I try to compute the variance of ratio between 2 sum of Gamma distribution with shape and scale different.

Numerator :

numerator

Denominator :

Denominator

To compute this variance, I am using coga library of R. Here's below the code snippet :

library(coga)

options(max.print=100000000)
my_data <- read.delim("Array_total.txt", header = FALSE, sep = " ")
array_2D <- array(my_data)

z_ph <- c(0.9595, 1.087, 1.2395, 1.45, 1.688)
b_sp <- c(1.42904922, 1.52601862, 1.63866958, 1.78259615, 1.91956918)
b_ph <- c(sqrt(1+z_ph))

y0 <- array(1000, dim=c(5,36))
y1 <- array(0, dim=c(5,36))
y2 <- array(0, dim=c(5,36))

y2_sp <- array(0, dim=c(5,36))
y2_ph <- array(0, dim=c(5,36))

y3 <- array(0, dim=c(5,36))
y4 <- array(0, dim=c(5,36))

y1 <- ((array_2D[,1])+1)/2
y2 <- 2*array_2D[,2]

for (i in 1:5) {
y2_sp[i] <- 2*array_2D[,2]*(b_sp[i]/b_ph[i])
y2_ph[i] <- 2*array_2D[,2]
}

for (i in 1:5) {
  y3[i] <- mapply(rcoga, y0[i], y1[i], y2_sp[i]) / mapply(rcoga, y0[i], y1[i], y2_ph[i])
  y4[i] <- as.double(unlist(y3[i]))
}

# do grid
grid <- seq(0, 15, length.out=36000)
## calculate pdf and cdf
pdf <- array(0, dim=c(5))
for (i in 1:5) {
pdf[i] <- mapply(dcoga, grid, shape=y1[i], rate= y2_sp[i]) / mapply(dcoga, grid, shape=y1[i], rate= y2_ph[i])
}
## plot pdf
plot(density(y4[1,]), col="blue")
for (i in 1:5) {
  print('var(y4) = ', var(y4[i,]))
}
  1. The first issue is that print('var(y4) = ', var(y4[i,])) doesn't display anything and I don't understand why : below the result in R shell :
> source("exact_example.R")
[1] "var(y4) = "
[1] "var(y4) = "
[1] "var(y4) = "
[1] "var(y4) = "
[1] "var(y4) = "

This first issue has been solved by just using paste0 function (see first comment below).

  1. The second main issue is that I get this same following error during execution :
Warning messages:
1: In y2_sp[i] <- 2 * array_2D[, 2] * (b_sp[i]/b_ph[i]) :
   number of items to replace is not a multiple of replacement length
2: In y2_ph[i] <- 2 * array_2D[, 2] :
   number of items to replace is not a multiple of replacement length

Warning messages:
1: In y3[i] <- mapply(rcoga, y0[i], y1[i], y2_sp[i])/mapply(rcoga,  :
      number of items to replace is not a multiple of replacement length
2: In y3[i] <- mapply(rcoga, y0[i], y1[i], y2_sp[i])/mapply(rcoga,  :
   number of items to replace is not a multiple of replacement length

Warning messages:
1: In pdf[i] <- mapply(dcoga, grid, shape = y1[i], rate = y2_sp[i])/mapply(dcoga,  :
   number of items to replace is not a multiple of replacement length
2: In cdf[i] <- mapply(pcoga, grid, shape = y1[i], rate = y2_sp[i])/mapply(pcoga,  :
   number of items to replace is not a multiple of replacement length

However, I can plot a density function which looks like :

density function for first bin : y4[i,]

What is wrong with this systematical error about multiple of replacement length ? Sorry, I begin with R language.

halfer
  • 19,824
  • 17
  • 99
  • 186
  • 3
    You should try `print(paste0('var(y4) = ', var(y4[i,])))`. – Martin Gal Aug 31 '21 at 19:40
  • @MartinGal . Thanks for your quick answer , it works ! but a question : why a simple `print('var(y4) = ', var(y4[i,]))` in a loop on index `i` doesn't work ? –  Aug 31 '21 at 19:56
  • 1
    `print` prints the (first) argument, other arguments (the ones after the first `,`) are passed to other methods. So more or less: you have to pack everything you want to be `print`ed in the first argument. – Martin Gal Aug 31 '21 at 20:16
  • Please show an example of `my_data`, best using `dput(head(my_data))`. Edit your question and put the `structure()` output into it. – Martin Gal Aug 31 '21 at 20:46
  • @MartinGal . The input file `Array_total.txt` is available on http://31.207.36.11/Array_total.txt . It is a 36 rows with 6 columns. Thanks for your support –  Aug 31 '21 at 20:55
  • What are you trying to do? `mapply(rcoga, y0[i], y1[i], y2_sp[i]) / mapply(rcoga, y0[i], y1[i], y2_ph[i])` creates a 1000 times 36 matrix (if the indices are corrected). This won't fit into a 5 x 36 array. – Martin Gal Aug 31 '21 at 21:11
  • @MartinGal . I try to pass arguments vectors for the function `rcoga` and not just one value for `y1`, `y2_sp` and `y2_ph` . The reason is that I want to compute the ratio of the sum of Gamma distributions. How could I compute this ratio numerically ? Indeed, I need to compute after this computation the variance of this ratio. –  Aug 31 '21 at 21:19
  • @MartinGal . I want to do this computation of ratio for each couple (column1,column2), (column1, column3), (column1,column4), (column1,column5) and (column1,column6) of Array_total.txt –  Aug 31 '21 at 21:23
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/236625/discussion-between-martin-gal-and-youpilat13). – Martin Gal Aug 31 '21 at 21:25
  • I have rolled this question back to the version that was answered in the accepted/bountied solution below by @MartinGal. The subsequent edit was so major it would have to be a new and separate question. – halfer Sep 23 '21 at 21:19

1 Answers1

1

A first part without the grid and pdf/cdf part

library(coga)
library(dplyr)

# options(max.print = 100000000)

# read data ----
my_data <- read.delim("Array_total.txt", header = FALSE, sep = " ")

data <- my_data %>% 
  transmute(shape = V1,
            across(2:6, ~ .x, .names = "rate_{.col}"))


# set parameters ----
z_ph <- c(0.9595, 1.087, 1.2395, 1.45, 1.688)
b_sp <- c(1.42904922, 1.52601862, 1.63866958, 1.78259615, 1.91956918)
b_ph <- sqrt(1+z_ph)

y2_sp <- array(0, dim=c(5,36))
y2_ph <- array(0, dim=c(5,36))

# calculate l ----
y1 <- (data$shape + 1) / 2

# prepare output array ----
y4 <- matrix(0, nrow = 5, ncol = 1000)


# loop over different rate buckets ----
for (i in 1:5) {
  y2_sp[i,] <- 2 * data[, i + 1] * (b_sp[i] / b_ph[i])^2
  
  y2_ph[i,] <- 2 * data[, i + 1]
  
  # y3 <- rcoga(1000, y1, y2_sp[i,]) / rcoga(1000, y1, y2_ph[i,])
  
  y4[i,] <- rcoga(1000, y1, y2_sp[i,]) / rcoga(1000, y1, y2_ph[i,])
  
  # plot graph ----
  plot(density(y4[i,]), col="blue")
  
  # print variance ----
  print(paste0('var(y4) = ', var(y4[i,])))
}

This returns

[1] "var(y4) = 6.71357918252685e-05"
[1] "var(y4) = 6.64394691819802e-05"
[1] "var(y4) = 5.82709872201527e-05"
[1] "var(y4) = 5.13592254057074e-05"
[1] "var(y4) = 4.43354125364343e-05"

and five plots similar to this one

enter image description here

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Thanks, could you show me please how to create a legend when we have multiple curves to plot like me ? –  Sep 06 '21 at 08:23
  • If you want to plot multiple curves into one plot, you could use `plot(density(y4[,1]), col="blue", main="z = 0.9595", xlab=paste0('sigma = ',format(round(sd(y4[,1]), 5), nsmall = 5)))` for your first plot and than add multiple other plots using `lines(density(y4[,2]), col="red", main="z = 1.087", xlab=paste0('sigma = ',format(round(sd(y4[,2]), 5), nsmall = 5)))`. To add a legend, you could use for example `legend(-3.5, 0.3, legend=c("Plot 1", "Plot 2"), col=c("blue", "red"), lty=c(1, 4), title="My Different Plots", text.font=4, bg='lightgrey')`. The first two value give the coordinates... – Martin Gal Sep 06 '21 at 08:58
  • ... of the legend, the other arguments are used to create the linetypes, color etc. – Martin Gal Sep 06 '21 at 08:59