How to get count of words in a document which is already present in another file?

Question

I am working on the text mining area and is new to it. I a file containing a list of words with their corresponding weights as given below:

                  Malfunction         Weight
                  malformed             1
                  unformed             0.9
                  intensive            0.8
                  malfunctioned        0.7
                  front                0.6
                  icu                  0.5
                  injury               0.4
                  care                 0.3
                  disease              0.2
                  diagnosis            0.1

Now I want to check each of these words in the list with a document and retrieve the count of occurrence of each term in the document. Can any one tell how to do the same in R?

I have used the tm package but I don't want to do term-document matrix. First I need to find words that match with above and then I need to find the number of occurrence of these words in each document

See [tm](https://cran.r-project.org/web/packages/tm/) package, show some effort, then let us know when/if you get stuck. — zx8754, Sep 22 '15 at 08:32
I have used tm package. That is not I wanted. If I got the answer from the tm package, I might not have posted a question here. — Athira, Sep 22 '15 at 08:43
If you are thinking of using tdm or dtm, I don't need to find tdm. If you have any other answer you can post it. — Athira, Sep 22 '15 at 08:46
Hint to the OP: You should have included in your question that you have used the tm package and what else you have tried. Especially state why this package doesn't work for you, otherwise no one can help you. — Verena Haunschmid, Sep 22 '15 at 09:17
We are more than happy to help, as long as we see you have tried and got stuck. Show some code, and narrow down your problem. Quick search gives me [this webpage](https://deltadna.com/blog/text-mining-in-r-for-term-frequency/) which illustrates how to count word frequency using `tm` package. — zx8754, Sep 22 '15 at 09:17

score 1 · Answer 1 · answered Sep 22 '15 at 09:04

1

if you need a more basic introduction I recommend this book

if you only want to count these ten words you could use:

length(document.words.v[which(document.words.v=="malformed")])

for each word

answered Sep 22 '15 at 09:04

JBGruber

11,727
1
23
45

2

a lot shorter: sum(which(document.words.v=="malformed")) – Verena Haunschmid Sep 22 '15 at 09:17
Thank you I got an idea on how to proceed. I have also looked at stringr package. – Athira Sep 22 '15 at 12:48

How to get count of words in a document which is already present in another file?

1 Answers1