1

I want to compute mutual information for all x, y in features : I(x , y) So I need to compute P(x) P(y) and P(x, Y) in Data for example:

X Y - - yes 2 no 2 yes 2 no 1 yes 1

p(yes)=3/5 p(2)=3/5 p(yes,2)=2/5

counting in Map Reduce is easy and I did it for P(x) and P(y) but for p(x, Y) I want to compute co-occuring in each object.

I write a Map-function:

  mapper <- function(key, line) {

      fvec <- unlist(strsplit(line, split = " "))

      for(i in 1:55){

        for(j in (i+1):56){
               fvec<-c(fvec,paste0(fvec[i],",",fvec[j]))}}

      keyval(fvec, 1)
    }

"fvec" is a vector of features as first row in our example :

 fvec[1]="yes" "2"

in for loops I want to concatenate this features so that

fvec[1]="yes" "2" "yes2"

in Order to count the Occurance of yes-2 together in reduce func :

 reduce = function(k,v)
        {return(keyval(k,length(v)))

but because of "LOOP PROBLEM" in Hadoop it does not work properly. please help me with a R solution to handle it :)

0 Answers0