2

I am trying to do some text analytic on tweets, and trying to use LSA() for DR. However, seems like calculating lsa space is EXTREMELY memory intensive. I can only process up to 2.3k tweets or my computer will die.

As I researched through online resources for parallel processing, I learned that, even though my computer is 4 core, i'll only use 1 of them because that's the default setting in R. I've also read this post here that is extremely helpful, but seems like that parallel processing can only be done:

  1. on functions that can be used in apply() families
  2. to replace for loops

I am trying to use parallel processing for lsa(). Here's my one line code:

lsa.train = lsa(tdm.train, dimcalc_share())

where the tdm.train is a TermDocumentMatrix with terms as rows and documents as columns.

my question is:

how can i change this line of code of lsa() so that it'll process in parallel format instead of sequential format? such that it'll use n cores instead of 1 core only, where n is number of cores defined by the user (me).

Community
  • 1
  • 1
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49

0 Answers0