I am trying to do some text analytic on tweets, and trying to use LSA() for DR. However, seems like calculating lsa space is EXTREMELY memory intensive. I can only process up to 2.3k tweets or my computer will die.
As I researched through online resources for parallel processing, I learned that, even though my computer is 4 core, i'll only use 1 of them because that's the default setting in R. I've also read this post here that is extremely helpful, but seems like that parallel processing can only be done:
- on functions that can be used in apply() families
- to replace for loops
I am trying to use parallel processing for lsa(). Here's my one line code:
lsa.train = lsa(tdm.train, dimcalc_share())
where the tdm.train is a TermDocumentMatrix with terms as rows and documents as columns.
my question is:
how can i change this line of code of lsa() so that it'll process in parallel format instead of sequential format? such that it'll use n cores instead of 1 core only, where n is number of cores defined by the user (me).