-1

I want to calculate levenshteinDist distance between the rownames and colnames of a matrix using mapply function: Because the volume of may matrix is too big and using a nested loop "for" take a very long time to give me the result.

Here's the old code with nested loop:

mymatrix  <- matrix(NA, nrow=ncol(dataframe),ncol=ncol(dataframe),dimnames=list(colnames(dataframe),colnames(dataframe)))
distfunction = function (text1, text2) {return(1 - (levenshteinDist(text1, text2)/max(nchar(text1), nchar(text2))))}
for(i in 1:ncol(mymatrix))
{
  for(j in 1:nrow(mymatrix))

   mymatrix[i,j]=(distfunction(rownames(mymatrix)[i], colnames(mymatrix)[j]))*100
 }

I tried to switch nested loop by mapply:

   mapply(distfunction,mymatrix)

It gave me this error:

   Error in typeof(str2) : argument "text2" is missing, with no default

I planned to apply the levenshteinDist distance to my matrix and then conclude how to apply myfunction.

Is it possible?

Thank you.

Sarah
  • 3
  • 2

1 Answers1

0

The function mapply cannot be used in this context. It requires two input vectors and the function is applied to the first elements, second elements, .. and so on. But you want all combinations applied.

You could try a stacked sapply

sapply(colnames(mymatrix), function(col) 
  sapply(rownames(mymatrix), function(row) 
    distfunction(row, col)))*100

Simple usage example

sapply(1:3, function(x) sapply(1:4, function(y) x*y))

Output:

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    4    6
[3,]    3    6    9
[4,]    4    8   12

Update

Even better is to use outer but i think your distfunction is not vectorized (due to the max). So use the wrapper function Vectorize:

distfunction_vec <- Vectorize(distfunction)
outer(rownames(mymatrix), rownames(mymatrix), distfunction_vec)

But I'm not sure about the performance penalty. Better to directly vectorize the function (probably with pmax).

DeltaKappa
  • 171
  • 7
  • Thank you @DeltaKappa. I'll try this solution. Hope that reduces the execution time. – Sarah Jan 06 '16 at 15:35
  • Your solution worked on my script like a charm ! Thank you so much @DeltaKappa. You're genius. – Sarah Jan 07 '16 at 09:00
  • I was trying your solution "update", hoping that i'll get a reduced time execution. But it seems that my function can't be vectorized : I switched `max` by `pmax` and I ran `distfunction_vec<-Vectorize(distfunction)` but `is.vector(distfunction_vec)` gave me `FALSE`. Any idea? – Sarah Jan 08 '16 at 11:30
  • `Vectorize` creates just a thin wrapper around the function so `is.vector()` will not return `TRUE` for a vectorized function. Its just nicer to read with outer. I didn't expect a performance gain. – DeltaKappa Jan 18 '16 at 14:01