-5

I have a data.frame of 30k records (company name and other attributes). dba_nm is the company name field with longest element < 60 characters.

The R session's memory usage goes up from 100MB to 3GB and hangs when I try the code in ?tm::VectorSource:

ds <- VectorSource(dat$dba_nm)
inspect(Corpus(ds))
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
dasman
  • 237
  • 1
  • 2
  • 10

1 Answers1

0

Well, I was getting a dataframe (dat) off a database and trying to read one of the columns (dba_nm) into a vectorsource. It turns out you have to convert it into a character vector. the following code works:

> cs <- as.character(dat$dba_nm)
> ds <- VectorSource(cs)
> Corpus(ds)
A corpus with 30453 text documents
dasman
  • 237
  • 1
  • 2
  • 10