-1

I have two csv files with characters running up to 50000 variables in a first column of these two files. I have to calculate cosine similarity between these columns of two files. I have tried to use LSA in R. But some problem with my result. Can any one help me? Below is my coding for the same.

library(lsa)
Gyan=tempfile() 
dir.create(Gyan) 
single_tags=read.csv(file.choose(), sep = ',')
as.character(single_tags$CULTAGS) 
options(max.print = 1000000) 
write(as.character(single_tags$CULTAGS),file = paste(Gyan, 'D1',sep = '1')) 
Single_ASFA=read.csv(file.choose(),sep = ',')
options(max.print = 1000000) 
as.character(Single_ASFA$ASFACV)
write(as.character(Single_ASFA$ASFCV),file = paste(Gyan, '/')) 
Mycomparison = textmatrix(Gyan, minWordLength = 1)
Mycomparison
res = lsa::cosine(myMatrix[,1],myMatrix[,2]) 
res 
Artem
  • 3,304
  • 3
  • 18
  • 41
  • 6
    Please, don’t post images of code. Copy and paste it in your answer with the “code” option. – Norhther Aug 05 '18 at 10:03
  • 3
    Please read [Why not upload images of code on SO when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) – Rabbid76 Aug 05 '18 at 10:05
  • It would be nice if you add the sample of the data. Without it your example is rather difficult ot reporduce. – Artem Sep 01 '18 at 14:31

1 Answers1

0

It seems that there is no relation with myMatrix and Mycomparison. If you substitute MyMatrix to Mycomparison everything works. See as below:

# Data Simulation
single_tags_df <- data.frame( CULTAGS =  c("dog", "cat", "sushi", "mouse", "leech"))
Single_ASFA_df <- data.frame(ASFCV =  c("hamster", "mouse", "sushi", "man"))
write.csv(single_tags_df, file = "single.csv")
write.csv(Single_ASFA_df, file = "ASFA.csv")

library(lsa)
Gyan <- tempfile() 
dir.create(Gyan) 

single_tags <- read.csv("single.csv", sep = ",")
as.character(single_tags$CULTAGS) 
options(max.print = 1000000) 
write(as.character(single_tags$CULTAGS), file = paste(Gyan, "D1", sep = "/")) 
Single_ASFA <- read.csv("ASFA.csv", sep = ",")
options(max.print = 1000000) 
as.character(Single_ASFA$ASFCV)
write(as.character(Single_ASFA$ASFCV), file = paste(Gyan, "D2", sep = "/")) 

Mycomparison <- textmatrix(Gyan)
Mycomparison
unlink(Gyan)

res <- lsa::cosine(Mycomparison[, 1], Mycomparison[, 2]) 
res 
#           [,1]
# [1,] 0.4472136
Artem
  • 3,304
  • 3
  • 18
  • 41