I have a two datasets stored in tables, one is a set of [a, b]
and another is [x, Sx, y, Sy, rho]
. I have a probability function f
that requires (a, b, x, Sx, y, Sy, rho)
. In the end I want to find the sum of the probability results over all [x, Sx, y, Sy, rho]
for the first [a, b]
. Then find the sum for all [x, Sx, y, Sy, rho]
over the second [a, b]
, etc...
I would like to have a few hundred rows in the [x, Sx, y, Sy, rho]
file and a few hundred thousand rows in the [a, b]
file.
I'm wondering if there is a way to do this without using two loops? I've tried the following, and it doesn't quite work the way I want it to, but I know it will be far too slow.
I don't know if it will help but I've added the function in the code. Sorry that the function itself is a mess and not formatted properly.
# data file with (a, b)
data <- matrix( c(1, 0, 1, 1, 0.5, 0), nrow=3, ncol=2)
colnames(data) <- c("a", "b")
Ndat <- dim(data)
Ndata <- Ndat[1]
# data2 file with (x, Sx, y, Sy, rho)
data2 <- matrix( c(1, 0.1, 1, 0.1, 0.002, 2, 0.1, 2, 0.1, 0.000001,
2, 0.1, 1, 0.1, 0.002), nrow=3, ncol=5)
colnames(data2) <- c("x", "Sx", "y", "Sy", "rho")
Ndat2 <- dim(data)
Ndata2 <- Ndat[1]
# function requires variables (a, b, s, Sx, y, Sy, rho)
Prob <- function(a, b, Xi, sX, Yi, sY, rho) {sqrt(1 + a ^ 2) * (
exp(-((b + a * Xi - Yi) ^ 2 / (
2 * ((a ^ 2 * sX ^ 2) -
(2 * a * rho * sX * sY) + sY ^ 2)))) * sqrt((
1 - rho ^ 2) / (
a ^ 2 * sX ^ 2 - 2 * a * rho *sX *sY + sY ^ 2))/(
sqrt(2 * pi) * sqrt(1 - rho ^ 2)))
}
# Here is my weak attempt
Table <- NULL
Table <- for (j in 1:Ndata) {
sum (for (i in 1:Ndata2) {
Datatable[i] = Prob(data[j, a], data[j, b], data2[i, x],
data2[i, Sx], data2[i, y], data2[i, Sy],
data2[i, rho])
})
}
I am having a very hard time wrapping my head around the apply
functions and when they can/should be used. I know that I've probably not added enough information, so any suggestions that can help me out would be great. I'm pretty new to programming as well as R, so please forgive any inappropriate vocabulary or formatting.
There is probably a better way to define the number or rows in data
to get Ndata
as a global, but these are the first I stumbled across.
The function should not be recursive, but I see now that it is as I've written it. I have spent many hours on intro tutorials to R and still am having a very hard time understanding how the apply
suite of functions are best implemented.
I would like one iteration to apply this function to each row in data2
using a, b
from the first row of data
. Then sum
the probability for all of those. Then the next iteration should sum all of the probabilities for row 2 of data
using a, b
applied to every row of data2