Questions tagged [term-document-matrix]

A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing.

A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents.

In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms.

There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing.

When creating a database of terms that appear in a set of documents the document-term matrix contains rows corresponding to the documents and columns corresponding to the terms. For instance if one has the following two (short) documents:

D1 = "I like databases"

D2 = "I hate hate databases",

then the document-term matrix would be:

/Ilikehatedatabases
D1 1 1 0 1
D2 1 0 2 1

which shows which documents contain which terms and how many times they appear. Note that more sophisticated weights can be used; one typical example, among others, would be tf-idf.

Source: http://en.wikipedia.org/wiki/Document-term_matrix

152 questions

votes

0 answers

Non-English Term-Document Matrix

I want to construct a Term-Document matrix in python for Arabic language, I used CountVectorizer(), but it gives the documents in the column, and terms in the row. I want the terms to be in columns and documents in rows, I tried to transpose the…

python term-document-matrix

asked Feb 26 '19 at 16:46

Mohammed ALDossari

votes

1 answer

Normalizing bag of words data in Gensim

I am using gensim to create a bag of words model and I want to perform normalization. I found the documentation (https://radimrehurek.com/gensim/models/normmodel.html), but I am confused as to how to implement that given the code I have.…

python normalization gensim corpus term-document-matrix

asked Jun 25 '18 at 19:33

Jane Sully

3,137
10
48
87

votes

3 answers

Create Frequency table using R and Term document Matrix

I have created the following dataframe consisting of a few e-mail subject lines. df <- data.frame(subject=c('Free ! Free! Free ! Clear Cover with New Phone', 'Offer ! Buy New phone and get earphone at 1000. Limited…

r frequency text-mining grepl term-document-matrix

asked Feb 16 '18 at 06:42

Raghavan vmvs

1,213
1
10
29

votes

1 answer

TermDocumentMatrix in R - only 1-grams created

I just started with tm package in R and cannot seem to overcome an issue. Even though my tokenizer functions seem to work right: uniTokenizer <- function(x) NGramTokenizer(x, Weka_control(min=1, max=1)) biTokenizer <- function(x) NGramTokenizer(x,…

r tm n-gram term-document-matrix

asked Oct 19 '17 at 00:13

Serg Sergeev

votes

1 answer

Text mining Error- Getting this error while creating the DocumentTermMatrix & Word Cloud

I am getting an error message 'Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), : 'i, j' invalid' While creating the DocumentTermMatrix or creating a Word Cloud.This is happening in all data sets.Here is the…

word-cloud term-document-matrix

asked Jul 08 '17 at 12:13

Nidhin Chandrasekhar

votes

1 answer

Converting a Spark dataframe to Term document matrix in R using sparklyr

I have a code in R which needs to be scaled to use big data. I am using Spark for this and the package that seemed most convenient was sparklyr. However, I am unable to create a TermDocument matrix from a Spark dataframe. Any help would be…

r apache-spark tm sparklyr term-document-matrix

asked Feb 17 '17 at 14:09

NinjaR

votes

1 answer

tm_map(gsub...) fails to replace words

# Loading required libraries # Set up logistics such as reading in data and setting up corpus ```{r} # Relative path points to the local folder folder.path="../data/InauguralSpeeches/" # get the list of file names speeches=list.files(path =…

r text-mining term-document-matrix

asked Jan 29 '17 at 23:55

user101998

votes

1 answer

R: DocumentTermMatrix Wrong Frequencies after mgsub

I have a DocumentTermMatrix and I´d like to replace specific terms in this document and to create a frequency table. The starting point is the original document as follows: library(tm) library(qdap) df1 <- data.frame(word =c("test", "test",…

r tm term-document-matrix

asked Jun 24 '16 at 11:38

OAM

votes

2 answers

R: Natural Language Processing on Support Vector Machine-TermDocumentMatrix

I have started working on a project which requires Natural Language Processing and building a model on Support Vector Machine (SVM) in R. I’d like to generate a Term Document Matrix with all the tokens. Example: testset <- c("From month 2 the AST…

r nlp svm tm term-document-matrix

asked Jun 15 '16 at 14:51

Chih-Ching Yeh

votes

2 answers

Importing a TermDocumentMatrix into R

I am working on a qualitative analysis project in the tm package of R. I have built a corpus and created a term document matrix and long story short I need to edit my term document matrix and conflate some of its rows. To do this I have exported it…

text-mining tm term-document-matrix

asked May 18 '16 at 18:47

lrampe

votes

1 answer

R construct document term matrix how to match dictionaries whose values consist of white-space separated phrases

When do text mining using R, after reprocessing text data, we need create a document-term matrix for further exploring. But in similar with Chinese, English also have some certain phases, such as "semantic distance", "machine learning", if you…

r dictionary text-mining term-document-matrix quanteda

asked Apr 20 '16 at 02:18

Fiona_Wang

votes

1 answer

"Difference" among Document Term Matrices

Suppose I have a set of 100 documents, 70 speaking of politics and 30 speaking of math (a weird combination, I know that). My goal is to represent them on xy throught methods like the multidimensional scaling analysis, network analyses, som, etc.…

r tm corpus term-document-matrix

asked Apr 01 '16 at 06:52

Andrea Ianni

votes

1 answer

How to get the Term-Doc frequency from many fields combined?

I have written an index with lucene, from a collection of documents. My documents have 2 fields and were added to the index like so: Document doc = new Document(); doc.add(new TextField("Title", "I am a title", Field.Store.NO)); doc.add(new…

java lucene term-document-matrix

asked Dec 23 '15 at 19:10

dimitris93

4,155
11
50
86

votes

2 answers

R How do i keep punctuation with TermDocumentMatrix()

I have a large dataframe where I am identifying patterns in strings and then extracting them. I have provided a small subset to illustrate my task. I am generating my patterns by creating a TermDocumentMatrix with multiple words. I use these…

r tm punctuation term-document-matrix

asked Nov 27 '15 at 10:01

CallumH

votes

1 answer

In R plotting keyword / word associations (findAssocs) with igraph on tdm or dtm in R?

I'd like to create a term network analysis plot based on certain word associations in R but I don't know how to go beyond plotting a entire Term Document Matrix: # Network analysis library(igraph) # load tdm data # create matrix Neg.m <-…

r plot igraph term-document-matrix

asked Nov 01 '15 at 23:11

Robert

Prev 1 2

…

10 11 Next