How can I get faster processing speeds for a single thread by combining multiple CPU cores, like training a custom neural network (not tensorflow) on a Google Compute Engine n1-highmem-64 machine type that has 64 CPU cores? Cluster computers or what? Not sure where to start... thanks!
1 Answers
Well you are asking for faster speeds on a single thread, but with multiple cores.
The only feasible way to get faster processing speeds off of a single thread, which is owned by a single core, is overclocking. You could also get a better chipset by getting newer cores.
It would be infeasible to accomplish this straightforward, you would likely have to patch the firmware to several of your components allowing them to communicate across cpus on a single thread utilizing an L3 cache or something.... very infeasible.
The opposite of that would be the way to go.
Multi-threading is used for processing different pieces of data concurrently on multiple cores.
General Purpose GPU use is for performing the same operation on a large set of data by farming the computation to the GPU. It has increased overhead time, but will give good results when the input is big enough.
It's funny that you mention not TensorFlow, because it actually implements both of these.
Even if you were able to implement something like this, it would probably just thrash over atomic locks unless you threaded it anyway.
Edit
If you are looking to use software as a service, Amazon (https://aws.amazon.com/tensorflow/ and other companies) have a range of services that are compatible with various deep-learning / machine-learning frameworks out of the box.

- 1,836
- 2
- 15
- 28
-
Ok, so the code I'm running has a prep stage where data is sorted and cleaned but then there's the much more intensive training stage, very repetitive across the epoch but the calculations are just summing up scores from activations on the output... Is there a way to start off on a single thread but once it reaches this training stage, calculate the activations by splitting up the epoch into 64 sections (or number of CPUs available) and return the sums to a single thread? The epoch sections are not dependent on each other apart from what was prepped and stored in RAM prior to training... – Jacob Edward Nov 26 '17 at 22:23
-
If you can implement your data cleaning in python then you can implement the training in Tensorflow, and it will automatically use all cores / hyper-threads or gpus. If not, you will have to implement multi-threading on the forward and back-propagation steps of your model and then perform a summation against the loss function and use that to adjust your weight matrices. – Free Url Nov 26 '17 at 22:32
-
Definitely not using Python, Node JS – Jacob Edward Nov 26 '17 at 22:44
-
How do you use hyper threading? I know where I would do it I just don't know how – Jacob Edward Nov 26 '17 at 22:45
-
If you are using an Intel-cpu it is automatically utilized. Hyper-threading means that each Intel core can support up to two threads at a time. There will be some thrashing in all likelihood in a computation intensive application like this. Opencv is another, better idea than threading, as you can farm a whole training set to the gpu and it is much faster. Open cv is supported on many gpu brands. https://opencv.org/ – Free Url Nov 26 '17 at 22:52
-
I am telling you from experience you are better using a framework than trying to reinvent the wheel. Even if you have to use the framework as a back end for some web architecture, it will likely run faster as it has been optimized for these sorts of computations. Or just buy a cray computer https://www.cray.com/ – Free Url Nov 26 '17 at 22:53
-
Not interested in software as a service or frameworks... Please stop trying to talk me out of what I'm asking... Your suggestion is not the first time I have encountered the ideas... – Jacob Edward Nov 27 '17 at 16:20
-
Ok, so apparently Skylake is an Intel chip, which is what compute engine is using for their largest memory machine type... – Jacob Edward Nov 27 '17 at 16:24