Questions tagged [tm]

The `tm` package (shorthand for Text Mining Infrastructure in R) provides a framework for text mining applications within R.

source: http://tm.r-forge.r-project.org/

tm - Text Mining Package

tm (shorthand for Text Mining Infrastructure in R) provides a framework for text mining applications within R.

The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package has integrated database back-end support to minimize memory demands. An advanced meta data management is implemented for collections of text documents to alleviate the usage of large and with meta data enriched document sets.

The package provides native support for reading in several classic file formats (e.g. plain text, PDFs, or XML files). There is also a plug-in mechanism to handle additional file formats.

The data structures and algorithms can be extended to fit custom demands, since the package is designed in a modular way to enable easy integration of new file formats, readers, transformations and filter operations.

tm provides easy access to preprocessing and manipulation mechanisms such as whitespace removal, stemming, or stopword deletion. Further a generic filter architecture is available in order to filter documents for certain criteria, or perform full text search. The package supports the export from document collections to term-document matrices.

tm is freely available under the GNU General Public License (GPL).

Resources:

CRAN summary page
R-Forge project page
FAQ
Ingo Feinerer, Kurt Hornik, and David Meyer. Text mining infrastructure in R. Journal of Statistical Software, 25(5):1-54, March 2008.

1083 questions

votes

4 answers

R stemming a string/document/corpus

I'm trying to do some stemming in R but it only seems to work on individual documents. My end goal is a term document matrix that shows the frequency of each term in the document. Here's an…

asked Aug 09 '12 at 04:17

screechOwl

27,310
61
158
267

votes

8 answers

How to show corpus text in R tm package?

I'm completely new in R and tm package, so please excuse my stupid question ;-) How can I show the text of a plain text corpus in R tm package? I've loaded a corpus with 323 plain text files in a corpus: src <-…

r tm corpus

asked May 25 '15 at 09:25

Azrael

votes

5 answers

tm: read in data frame, keep text id's, construct DTM and join to other dataset

I'm using package tm. Say I have a data frame of 2 columns, 500 rows. The first column is ID which is randomly generated and has both character and number in it: "txF87uyK" The second column is actual text : "Today's weather is good. John went…

r text-mining tm

asked Nov 08 '13 at 02:38

GorillaInR

votes

2 answers

Text-mining with the tm-package - word stemming

I am doing some text mining in R with the tm-package. Everything works very smooth. However, one problem occurs after stemming (http://en.wikipedia.org/wiki/Stemming). Obviously, there are some words, which have the same stem, but it is important…

r text-mining tm

asked Apr 17 '13 at 20:15

majom

7,863
7
55
88

votes

2 answers

Keep document ID with R corpus

I have searched stackoverflow and the web and can only find partial solutions OR some that don't work due to changes in TM or qdap. Problem below: I have a dataframe: ID and Text (Simple document id/name and then some text) I have two issues: Part…

r text text-mining tm corpus

asked Jul 01 '14 at 02:07

RUser

votes

2 answers

R text mining documents from CSV file (one row per doc)

I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want to import all the content of this feedback into a corpus but I want each line to be a different…

r text-mining documents corpus tm

asked Aug 01 '13 at 14:50

user2407054

votes

1 answer

transformation drops documents error in R

Whenever i run this code, tm_map line give me warning message as Warning message: In tm_map.SimpleCorpus(docs, toSpace, "/") : transformation drops documents texts <- read.csv("./Data/fast food/Domino's/Domino's veg pizza.csv",stringsAsFactors =…

r tm

asked Jun 28 '18 at 11:10

NRR

votes

3 answers

Efficient jaccard similarity DocumentTermMatrix

I want a way to efficiently calculate Jaccard similarity between documents of a tm::DocumentTermMatrix. I can do something similar for cosine similarity via the slam package as shown in this answer. I came across another question and response on…

r text-mining tm slam

asked Mar 25 '16 at 13:10

Tyler Rinker

108,132
65
322
519

votes

4 answers

Unable to convert a Corpus to Data Frame in R

I've looked at the other similar questions that have been posted here (like this), but the problem persists. I have a dataframe of textual data, which I need to stem. So I'm converting it into a corpus, stemming it, then completing the words from…

r text-mining tm corpus

asked Oct 18 '15 at 00:42

wrahool

1,101
4
18
42

votes

4 answers

Treat words separated by space in the same manner

I am trying to find the words occurring in multiple documents at the same time. Let us take an example. doc1: "this is a document about milkyway" doc2: "milky way is huge" As you can see in above 2 documents, word "milkyway" is occurring in both…

r text-mining tm corpus

asked Oct 13 '15 at 09:56

user3664020

2,980
6
24
45

votes

2 answers

R tm removeWords function not removing words

I am trying to remove some words from a corpus I have built but it doesn't seem to be working. I first run through everything and create a dataframe that lists my words in order of their frequency. I use this list to identify words I am not…

r text text-mining tm corpus

asked Aug 26 '15 at 11:44

Adam

1,147
3
15
23

votes

1 answer

How to convert vector of characters to corpus input for the DocumentTermMatrix function from tm package in R?

I am new to tm package. I'd like to use DocumentTermMatrix function to create DT- Matrix for further text-mining analysis but I am able to create propoer input for that function. I have my data input so far in a format of a character vector like…

r tm

asked Mar 23 '15 at 12:10

Marcin

7,834
8
52
99

votes

1 answer

Text Categorization in R

MY objective is to Automatically route the Feedback Email to respective division. My fields are FNUMBER,CATEGORY, SUBCATEGORY, Description. I have last 6 months Data in the above format - where the entire Email is stored in Description along with…

r text text-mining tm

asked Mar 10 '14 at 04:35

Prasanna Nandakumar

4,295
34
63

votes

1 answer

Search for mispellings of a word in a character vector with R - "inverse" spell checker

I am text mining a large database to create indicator variables which indicate the occurrence of certain phrases in a comments field of an observation. The comments were entered by technicians, so the terms used are always consistent. However,…

r spell-checking text-mining tm

asked Feb 01 '13 at 21:06

Nick Evans

votes

2 answers

Error faced while using TM package's VCorpus in R

I am facing the below error while working on the TM package with R. library("tm") Loading required package: NLP Warning messages: 1: package ‘tm’ was built under R version 3.4.2 2: package ‘NLP’ was built under R version 3.4.1 corpus <-…

r text-mining tm text-analysis

asked Nov 21 '17 at 06:27

Saharsh Gandhi

Prev 1 2

…

72 73 Next