Questions tagged [tm]

The `tm` package (shorthand for Text Mining Infrastructure in R) provides a framework for text mining applications within R.

source: http://tm.r-forge.r-project.org/

tm - Text Mining Package

tm (shorthand for Text Mining Infrastructure in R) provides a framework for text mining applications within R.

The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package has integrated database back-end support to minimize memory demands. An advanced meta data management is implemented for collections of text documents to alleviate the usage of large and with meta data enriched document sets.

The package provides native support for reading in several classic file formats (e.g. plain text, PDFs, or XML files). There is also a plug-in mechanism to handle additional file formats.

The data structures and algorithms can be extended to fit custom demands, since the package is designed in a modular way to enable easy integration of new file formats, readers, transformations and filter operations.

tm provides easy access to preprocessing and manipulation mechanisms such as whitespace removal, stemming, or stopword deletion. Further a generic filter architecture is available in order to filter documents for certain criteria, or perform full text search. The package supports the export from document collections to term-document matrices.

tm is freely available under the GNU General Public License (GPL).

Resources:

1083 questions
-3
votes
1 answer

Create other forms[noun,adjective,plural,verb..everything] of a word

Actually I am doing review analytics for a cruise company. I can not tell you the whole procedure as it is very lengthy but at least a snapshot of it. I took all the reviews, divided them into sentences then extracted some phrases out of that…
Dharam
  • 9
  • 2
-4
votes
1 answer

Is there standard function for binary search on ordered word list

I am doing some text mining using the tm package. I get ordered lists of words containing over 50,000 words. My corpus contains about 2 million words and I put all of them in a single document. In order to save some memory and be able to get ngrams…
Etienne Moerman
  • 331
  • 1
  • 9
-5
votes
1 answer

tm package in R hangs with small dataset

I have a data.frame of 30k records (company name and other attributes). dba_nm is the company name field with longest element < 60 characters. The R session's memory usage goes up from 100MB to 3GB and hangs when I try the code in…
dasman
  • 237
  • 1
  • 2
  • 10
1 2 3
72
73