0

I created a tf-idf DTM and a n-gram based DTM in text2vec, using the same dataset. now, i am able to run glmnet on each of them separately, but when i combine these 2 DTMs to via cBind, glmnet gives me an error:

Error in validObject(.Object) :invalid class “dgCMatrix” object: length(Dimnames[1]) differs from Dim[1] which is 43895

dtm_train_tfidf = (19579 * 27511) matrix, and

dtm_train_ngram = (19579 * 16384) matrix.

which means that they have the same exact number of rows, and i can combine them using cBind (cbind for matrices) and get a large matrix on which i should be able to run glmnet. only i am not able to run it and i get this error. how do i rectify?

Akhil
  • 165
  • 1
  • 1
  • 8

1 Answers1

1

This is due to the bug https://github.com/dselivanov/text2vec/issues/205. You can use development version from GitHub or just drop colnames of the dtm from hash vectorizer.

Dmitriy Selivanov
  • 4,545
  • 1
  • 22
  • 38
  • Thanks for the prompt response Dmitriy! you have made an excellent package for us R users. how do i go about using the development version from github or dropping the colnames of the dtm from the hash vectorizer? – Akhil Dec 15 '17 at 02:15
  • 1
    i solved it by running "devtools::install_github("dselivanov/text2vec")". now the code runs perfectly. Thanks Dmitriy!! – Akhil Dec 15 '17 at 02:26