1

I am trying to create a term-document matrix in R using the following dataset

  EmailSubject
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Thank You Chennai
Limited Period offer
 Valentines day special
 Buy a phone at 10000 and get a new sim free
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.

Experience a phone ahead of its time. Thank You Chennai Limited Period offer

I have used qdap and freq_terms. The following is the expected output

  freq_terms(DF)


     Expected Output    Frequency
      Buy               4
      Get               5
       a                7
      thank             12
     Stunning            6
         The             7
         New             10
       Valentines        4
        phone            7

The following special characters appear constantly and render the data unsuitable.

           valentinea€™s, a€™s instead of valentines, as. I have tried the same with tm package also. 

I have used gsub to replace these characters but it's not very effective. Can someone suggest a way?

ANJYR
  • 2,583
  • 6
  • 39
  • 60
Vishnu Raghavan
  • 83
  • 1
  • 10

0 Answers0