Questions tagged [term-document-matrix]

A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing.

A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents.

In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms.

There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing.

When creating a database of terms that appear in a set of documents the document-term matrix contains rows corresponding to the documents and columns corresponding to the terms. For instance if one has the following two (short) documents:

D1 = "I like databases"

D2 = "I hate hate databases",

then the document-term matrix would be:

/Ilikehatedatabases
D1      1      1      0      1      
D2      1      0      2      1      

which shows which documents contain which terms and how many times they appear. Note that more sophisticated weights can be used; one typical example, among others, would be tf-idf.

Source: http://en.wikipedia.org/wiki/Document-term_matrix

152 questions
-2
votes
1 answer

How to combine rows into one row in TermDocumentMatrix?

Iam trying to combine rows into on row in TermDocumentMatrix (I know every row represents each words) ex) cabin, staff -> crews Because 'cabin, staff and crew' mean samething, Iam trying to combine rows which represent 'cabin, staff' into one row…
-3
votes
2 answers

Term frequency matrix

I have a string like this: m<-"abcdabcdbcadacbddabcc..." I would like to generate a matrix like this: How can I do that in r?
Kaja
  • 2,962
  • 18
  • 63
  • 99
1 2 3
10
11