0

I am using a Samsung NP350V5C-S06IN Laptop for Machine Learning related data processing. Specifications: (3rd Gen Ci7 (2.3 GHz)/ 8GB RAM / Win7 HP/ 2GB AMD Radeon HD 7670M Graphics Card)

Running a computation intensive algorithm like RF or GBM takes a lot of time -4hrs to 6hrs. However when I monitor the system while the process is running through the Task Manager, I observe that the utilization of each of the 8 cores is very low ~15%-20% percent only at any given moment. Is there any way I can increase the utilization of each of the cores to make my processing faster?

Specific Questions: Can installing Hadoop will help me to enhance utilization and processing speed? Is there any way to utilize the Graphics Card and its 2 GB memory?

Thomas Jungblut
  • 20,854
  • 6
  • 68
  • 91
Abhishek Nalin
  • 4,309
  • 5
  • 24
  • 32
  • Did you implement the algorithms yourself? If so, what language? – bogatron Aug 02 '13 at 12:36
  • I use standard implementations from R, Weka or Python - This observation was specifically while using WEKA. – Abhishek Nalin Aug 02 '13 at 12:42
  • 1
    I'm removing the Hadoop tag, as it is not designed to run on a single machine. You want to multithread your random forest, Scikit.Learn has such an implementation if I'm not mistaken. – Thomas Jungblut Aug 02 '13 at 13:01
  • So, you are sure that Hadoop cannot take the advantage of multiple cores in a single machine? – Abhishek Nalin Aug 02 '13 at 13:35
  • I agree with @ThomasJungblut. Hadoop is a distributed platform and shows its true power in a distributed environment. You need to implement your algo in such a way that you almost kill your machine :) – Tariq Aug 02 '13 at 14:06
  • Hadoop reads off data from disk, so you will see the advantage when you have a sizable computing cluster and > 10 terabytes of data. – Thomas Jungblut Aug 02 '13 at 14:17
  • Ok. Got it. Hadoop ruled out. Using multi-threaded algorithms is one solution. Any other solutions? – Abhishek Nalin Aug 02 '13 at 15:20

0 Answers0