1

I want to view term frequencies in documents, my documents contain Persian text. I used R as follows:

keycorpus <- Corpus(DirSource("E:\\Sample\\farsi texts"))
tm.matrix <- TermDocumentMatrix(keycorpus)
View(as.matrix(tm.matrix))

Although this code is OK for english texts, unfortunately it does not work on Persian texts. How can I do this?

amonk
  • 1,769
  • 2
  • 18
  • 27
M.Rabiei
  • 11
  • 2

1 Answers1

1

suppose that you have a text file named 1.txt then:

 Sys.setlocale(locale = "Persian",category = "LC_ALL")
 setwd("E:\\Sample\\farsi_texts")
 text<-readLines("1.txt",encoding = "windows-1256")
 keycorpus <- Corpus(VectorSource(text))
 tm.matrix <- TermDocumentMatrix(keycorpus)
 View(as.matrix(tm.matrix))

it shows each word repetition in each line you can use this code to aggregate:

tm.iteration<-as.data.frame(apply(tm.matrix,1 ,sum)) View(as.matrix(tm.iteration))

saeed_ans
  • 11
  • 3