I am trying to create a term-document matrix in R using the following dataset
EmailSubject
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time.
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Thank You Chennai
Limited Period offer
Valentines day special
Buy a phone at 10000 and get a new sim free
Buy a phone at 10000 and get a new sim free
Buy the stunning new phone
The game changer is here.
Experience a phone ahead of its time. Thank You Chennai Limited Period offer
I have used qdap and freq_terms. The following is the expected output
freq_terms(DF)
Expected Output Frequency
Buy 4
Get 5
a 7
thank 12
Stunning 6
The 7
New 10
Valentines 4
phone 7
The following special characters appear constantly and render the data unsuitable.
valentinea€™s, a€™s instead of valentines, as. I have tried the same with tm package also.
I have used gsub to replace these characters but it's not very effective. Can someone suggest a way?