I want to compute mutual information for all x, y in features : I(x , y) So I need to compute P(x) P(y) and P(x, Y) in Data for example:
X Y - - yes 2 no 2 yes 2 no 1 yes 1
p(yes)=3/5 p(2)=3/5 p(yes,2)=2/5
counting in Map Reduce is easy and I did it for P(x) and P(y) but for p(x, Y) I want to compute co-occuring in each object.
I write a Map-function:
mapper <- function(key, line) {
fvec <- unlist(strsplit(line, split = " "))
for(i in 1:55){
for(j in (i+1):56){
fvec<-c(fvec,paste0(fvec[i],",",fvec[j]))}}
keyval(fvec, 1)
}
"fvec" is a vector of features as first row in our example :
fvec[1]="yes" "2"
in for loops I want to concatenate this features so that
fvec[1]="yes" "2" "yes2"
in Order to count the Occurance of yes-2 together in reduce func :
reduce = function(k,v)
{return(keyval(k,length(v)))
but because of "LOOP PROBLEM" in Hadoop it does not work properly. please help me with a R solution to handle it :)