I'm in the process of converting my existing R code to Python as a way to teach myself, but I've run into something that I can't seem to crack.
Here's a example of the R code which works as expected
var <- 0.08
a <- data.frame(a = runif(10, 0, 1),
b = runif(10, 0, 1),
c = runif(10, 0, 1),
d = runif(10, 0, 1))
b <- data.frame(a = c(0,4,6,8,10,12,12,14,16,18),
b = c(2,6,8,10,12,14,14,16,18,20),
c = c(4,8,10,12,14,16,16,18,20,22),
d = c(6,10,12,14,16,18,18,20,22,24))
output <- data.table(total = seq(0, 10))
output[total%%2==0, prob:= apply(output[total%%2==0], 1, function(x) { sum(a[, 1:4] * (b[, 1:4]==x[1]))})]
output[total%%2==1, prob:= apply(output[total%%2==1], 1, function(x) { sum(a[, 1:4] * (b[, 1:4]==(x[1]-1))) * var/(1-var)})]
and here's what I tried in Python which is returning 'nan' fields in the 'prob' column
import numpy as np
import pandas as pd
var = 0.08
a = pd.DataFrame(np.random.uniform(0, 1, size=(10, 4)), columns=['a', 'b', 'c', 'd'])
b = pd.DataFrame({'a': [0, 4, 6, 8, 10, 12, 12, 14, 16, 18],
'b': [2, 6, 8, 10, 12, 14, 14, 16, 18, 20],
'c': [4, 8, 10, 12, 14, 16, 16, 18, 20, 22],
'd': [6, 10, 12, 14, 16, 18, 18, 20, 22, 24]})
output = pd.DataFrame({'total': range(0, 11)})
output.loc[output['total'] % 2 == 0, 'prob'] = output[output['total'] % 2 == 0].apply(lambda x: np.sum(a.iloc[:, 0:4] * (b.iloc[:, 0:4] == x[0])), axis=1)
output.loc[output['total'] % 2 == 1, 'prob'] = output[output['total'] % 2 == 1].apply(lambda x: np.sum(a.iloc[:, 0:4] * (b.iloc[:, 0:4] == (x[0] - 1))) * var / (1 - var), axis=1)
any help would be appreciated!
Thanks