I am trying to find a way to execute CPU intensive parallel jobs over a cluster. My objective is to schedule one job per core, so that every job hopefully gets 100% CPU utilization once scheduled. This is what a have come up with so far:
FILE build_sshlogin.sh
#!/bin/bash
serverprefix="compute-0-"
lastserver=15
function worker {
server="$serverprefix$1";
free=$(ssh $server /bin/bash << 'EOF'
cores=$(grep "cpu MHz" /proc/cpuinfo | wc -l)
stat=$(head -n 1 /proc/stat)
work1=$(echo $stat | awk '{print $2+$3+$4;}')
total1=$(echo $stat | awk '{print $2+$3+$4+$5+$6+$7+$8;}')
sleep 2;
stat=$(head -n 1 /proc/stat)
work2=$(echo $stat | awk '{print $2+$3+$4;}')
total2=$(echo $stat | awk '{print $2+$3+$4+$5+$6+$7+$8;}')
util=$(echo " ( $work2 - $work1 ) / ($total2 - $total1) " | bc -l );
echo " $cores * (1 - $util) " | bc -l | xargs printf "%1.0f"
EOF
)
if [ $free -gt 0 ]
then
echo $free/$server
fi
}
export serverprefix
export -f worker
seq 0 $lastserver | parallel -k worker {}
This script is used by GNU parallel as follows:
parallel --sshloginfile <(./build_sshlogin.sh) --workdir $PWD command args {1} ::: $(seq $runs)
The problem with this technique is that if someone starts another CPU intensive job on a server in the cluster, without checking the CPU usage, then the script will end up scheduling jobs to a core that is being used. In addition, if by the time the first jobs finishes, the CPU usage has changed, then the newly freed cores will not be included for scheduling by GNU parallel for the remaining jobs.
So my question is the following: Is there a way to make GNU parallel re-calculate the free cores/server before it schedules each job? Any other suggestions for solving the problem are welcome.
NOTE: In my cluster all cores have the same frequency. If someone can generalize to account for different frequencies, that's also welcome.