I have a model I'm trying to build using LogisticRegression
in sklearn
that has a couple thousand features and approximately 60,000 samples. I'm trying to fit the model and it's been running for about 10 mins now. The machine I'm running it on has gigabytes of RAM and several cores at its disposal and I was wondering if there is any way to speed the process up
EDIT The machine has 24 cores and here is the output of top to give an idea of memory
Processes: 94 total, 8 running, 3 stuck, 83 sleeping, 583 threads 20:10:19
Load Avg: 1.49, 1.25, 1.19 CPU usage: 4.34% user, 0.68% sys, 94.96% idle
SharedLibs: 1552K resident, 0B data, 0B linkedit.
MemRegions: 51959 total, 53G resident, 46M private, 676M shared.
PhysMem: 3804M wired, 57G active, 1042M inactive, 62G used, 34G free.
VM: 350G vsize, 1092M framework vsize, 52556024(0) pageins, 85585722(0) pageouts
Networks: packets: 172806918/25G in, 27748484/7668M out.
Disks: 14763149/306G read, 26390627/1017G written.
I'm trying to train the model with the following
classifier = LogisticRegression(C=1.0, class_weight = 'auto')
classifier.fit(train, response)
train
has rows that are approximately 3000 long (all floating point) and each row in response
is either 0
or 1
. I have approximately 50,000 observations