how to make clisp or sbcl use all cpu cores availble?

Question

Over a remote ssh connection, I'm trying to cross-compile sbcl using clisp. The steps I've followed thus far are such:

I downloaded most recent sbcl source, (at this time sbcl-1.3.7), uncompressed it, and entered its source directory.

Then to build it:

root@remotehost:/sbcl-1.3.7# screen 
root@remotehost:/sbcl-1.3.7# sh make.sh --prefix=/usr --dynamic-space-size=2Gb --xc-host='clisp -q'
root@remotehost:/sbcl-1.3.7# Ctrl-A Ctrl-D
[detached from 4486.pts-1.remotehost]r/fun-info-funs.fas
root@remotehost:/sbcl-1.3.7#

Over a second remote ssh connection to the same box, top reports cpu usage at 6%

nproc says I have 16 cores (this is google compute engine--no way could I afford something with 16 cores :)

MAKEFLAGS are set in my environment to -j16, but I guess clisp isn't aware of that. How can I get this build to make use of all 16 cores?

score 1 · Answer 1 · answered Oct 23 '16 at 08:24

I reccomend you to use a parallism library, I really like lparallel library

It has pretty utilities to parallelize your code ammong all processors in your machine. This is an example for my macbook pro (4 cores) using SBCL. There are a great series of common lisp concurrence and parallelism here

But let's create a sample example using lparallel cognates, note that this example is not a well exercise of parallelism is only to show the power of leparallel and how easy is to use.

Let's consider a fibonacci tail recursive function from cliki:

(defun fib (n) "Tail-recursive computation of the nth element of the Fibonacci sequence" (check-type n (integer 0 *)) (labels ((fib-aux (n f1 f2) (if (zerop n) f1 (fib-aux (1- n) f2 (+ f1 f2))))) (fib-aux n 0 1)))

This will be the sample high computation cost algorithm. let's use it:

CL-USER> (time (progn (fib 1000000) nil))
Evaluation took:
  17.833 seconds of real time
  18.261164 seconds of total run time (16.154088 user, 2.107076 system)
  [ Run times consist of 3.827 seconds GC time, and 14.435 seconds non-GC time. ]
  102.40% CPU
  53,379,077,025 processor cycles
  43,367,543,984 bytes consed

NIL

this is the calculation for the 1000000th term of the fibonacci series on my computer.

Let's for example calculate a list of fibonnaci numbers using mapcar:

CL-USER> (time (progn (mapcar #'fib '(1000000 1000001 1000002 1000003)) nil))
Evaluation took:
  71.455 seconds of real time
  73.196391 seconds of total run time (64.662685 user, 8.533706 system)
  [ Run times consist of 15.573 seconds GC time, and 57.624 seconds non-GC time. ]
  102.44% CPU
  213,883,959,679 processor cycles
  173,470,577,888 bytes consed

NIL

Lparallell has cognates:

They return the same results as their CL counterparts except in cases where parallelism must play a role. For instance premove behaves essentially like its CL version, but por is slightly different. or returns the result of the first form that evaluates to something non-nil, while por may return the result of any such non-nil-evaluating form.

first load lparallel:

CL-USER> (ql:quickload :lparallel)
To load "lparallel":
  Load 1 ASDF system:
    lparallel
; Loading "lparallel"

(:LPARALLEL)

So in our case, the only thing that you have to do is initially a kernel with the number of cores you have available:

CL-USER> (setf lparallel:*kernel* (lparallel:make-kernel 4 :name "fibonacci-kernel"))
#<LPARALLEL.KERNEL:KERNEL :NAME "fibonacci-kernel" :WORKER-COUNT 4 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {1004E1E693}>

and then launch the cognates from the pmap family:

CL-USER> (time (progn (lparallel:pmapcar #'fib '(1000000 1000001 1000002 1000003)) nil))
Evaluation took:
  58.016 seconds of real time
  141.968723 seconds of total run time (107.336060 user, 34.632663 system)
  [ Run times consist of 14.880 seconds GC time, and 127.089 seconds non-GC time. ]
  244.71% CPU
  173,655,268,162 processor cycles
  172,916,698,640 bytes consed

NIL

You can see how easy is to parallelize this task, lparallel has a lot of resources that you can explore:

I also add a capture of the cpu usage from the first mapcar and pmapcar in my Mac:

score 0 · Answer 2 · answered Jul 26 '16 at 13:35

0

SBCL cross compiling (aka bootstrapping) is done in 100% vanilla CL and there are nothing in the standard about threading or multiple processes.

Using a supplied SBCL binary for your machine might not use threading either even though I know SBCL does have thread support in the language.

You only need to do this if you have altered the source of SBCL as the latest supported version is usally precompiled already. I haven't compiled SBCL myself but i doubt it takes longer than my compilations of linux in the 90s that used less time than my nightly sleep.

answered Jul 26 '16 at 13:35

Sylwester

47,942
4
47
79

Vague memory from 2002-or-so, when I was working on what eventually was turned into SBCL's current compile-time diagnostic-muffling is that a from-scratch SBCL build took ~30-40 minutes on the laptop I had at the time (which was a 3-4 year old Dell laptop with ~1 GB RAM, if memory serves me right). – Vatine Jul 26 '16 at 14:26
Thanks for letting me know it can't be done! True, sbcl doesn't take that long to compile on my thinkpad (about an hour), but, google compute engine turns hours into minutes, on standard gcc compiles that recognize makeflags: I was hoping to learn of something similar for lisp. – slac.in.the.box Jul 26 '16 at 16:43

how to make clisp or sbcl use all cpu cores availble?

2 Answers2