Will we see an expected speedup in Chapel if running "inside" VMs?

Question

I'm teaching with Chapel next semester and we are considering using a VM for students to program on instead of a physical machine. As part of class, I want students to be able to see speedup when using multiple threads. I fear that they won't be able to see this as the VM will act with implicit hyperthreading; one thread will run just as fast as many threads.

Does anyone have any experience with this? Is there any chance I can use a VM instead of a physical device?

No experience with chapel, but why would you think that VMs don't support multiple threads? A large part of current infrastructure runs in VMs in the cloud and they certainly are not limited to a single thread - same applies for VirtualBox and VMWare. Shouldn't be a problem as long as the application supports it — Voo, Dec 11 '17 at 21:39
My reaction was similar: I'd expect you to be able to see multi-threaded speedup in a VM. Have you tried and are you not? Which VM are you using? — Brad, Dec 11 '17 at 22:46
I haven't tried yet; our IT staff would be setting up the VM for my use. I did not assume that VMs are limited to one thread, but that single-threaded tasks might be simulated using multiple threads at the lower level anyways. I didn't see speedup when using a hyperthreaded machine a few years back and wasn't sure this would be any different. It sounds like we should go ahead and try it out. — Kyle, Dec 11 '17 at 23:00
Hyper threading may or may not speed up a specific load. It depends on how many calculation/cpu units are going idle during a task (for example stalls from memory fetches) - one can force this by writing code that will tend towards cache misses. That said, hyberthreading can usually be turned off at the bios level or via a kernal setting. — Chris K, Dec 12 '17 at 09:40
Speedup is a bonus, understanding the worlds of parallel code-execution is always a great deal. The problem starts from a poor service definition of a hosted ( virtual ) infrastructure ( which may serve well "shared-CPU-quota" + lost Caches (having incredibly high work-stealing rates). Unless your IT Dept. does a bright work to avoid at any & all level degradations of performance, your VM-systems will exhibit an uncontrolled amount of "sharing"-introduced fluctuations of performance, which is exactly what spoils any rigorous performance demonstrator. Try to get exclusive + affinity-nailed VMs. — user3666197, Dec 12 '17 at 09:40
In case your IT Dept. Eng.s can switch off HT + TurboBoost, your physical threads will enjoy stopped interim-core-"camp"-ing ( avoided jumping from one phy-CPU-core to another phy-CPU-core due to thermal-management during peak computing episodes ). As mentioned before, the "fragmented" cache-re-use episodes are painfull right during high computing workloads, which were engineered right to stay and re-use cache-line optimised code to benefit from local-data layouts. This gets lost if thermal-control of the CPU decides to push the whole execution to a just momentarily bit-"colder" CPU-core. +VMs — user3666197, Dec 12 '17 at 09:49
@bencray Yes, it did work! The VM has been successful. I'm not sure what went into creating it, but I'll see if I can get an answer for this up. — Kyle, Mar 08 '19 at 19:02

score 1 · Accepted Answer · answered Mar 08 '19 at 19:30

1

We had success with a Virtual Machine! The VM we used for the whole class has:

16 CPUs
a 60 GB hard disk
4 GB RAM
3 ESXI hosts

The system also has umlimited IOPs. (Input/Outputs per second.)

I recommend this solution to other teachers.

answered Mar 08 '19 at 19:30

Kyle

554
3
10

score -4 · Answer 2 · edited Dec 13 '17 at 15:19

Yes, but any speedup is way more a matter not of just a syntax-constructor, but of the problem's achievable ( [SEQ], [PAR] ) re-formulation:

With all due respect, professor, the Amdahl's Law is going against most of naive, just syntax-decorated efforts.

Contemporary criticism and re-formulation of the original Dr. Gene AMDAHL's argument has brought into account two major extensions:

overhead-strict formulation ( not to forget, that going from [SEQ] into [PAR] code-execution comes at a cost, always add-on costs, that go heavily against any expected ( actual add-on overheads costs agnostic ) speedup )
a principal limit of any [PAR]-execution granularity, at a finite, atomic-transaction level, where whatever further available resource, even in an infinite capacity, will not further improve the overall speed right due to a further indivisible scheduling "atomicity"

These both issues will dominate your education efforts way more than your actual VM-abstractions and would be indeed great to discuss in more detail all these impacts from scheduling-"blocking"-resources, not just the CPU-core(s) and hardware-threads ( onto which the O/S schedule ), be them physical or abstracted by the VM-hypervisor.

As the great CRAY Chapel team members has already noted many times, the real-hardware NUMA-issues are of great impact on final add-on overheads a high-level formulated syntax will actually inject into the real-platform processing, so the landscape is even wilder.

Virtual Machines:

Better inspect the VM-hypervisor generated VM-NUMA topology ( hwloc / lstopo ) to better decode VM-CPU-Cache architecture, your VM-sand-boxes will enjoy towards any hardware-directed low-level { C | assembly }-code, and one may imagine many "fooling" effects, if VM claims the vCPU has 8 independent vCPU-sockets, each having 4 independent vCPU-cores, each of which has a fully separate & autonomous hierarchy of non-shared vCPU-CACHE(s), none level of which is shared ( in spite of the facts, that the host's physical CPU(s) operate(s) principally shared L3_CACHE(s) ).

All this mis-directs any hardware-focused resources-optimiser's decisions ( and performance never goes up, if virtualisation missed the physical properties of the host ).

( One may also use a Live chapel platform at https://tio.run for tweaking and prototyping )

You have been warned and suspended for editing meta content into your posts. The contents of your answers should address the question, they should not address the poster personally or contain random anecdotes or tangentially related quotes. If I can remove content from your answer without making it a less useful answer, that content should not have been included in the first place. — user229044, Dec 13 '17 at 15:28

Will we see an expected speedup in Chapel if running "inside" VMs?

2 Answers2

Virtual Machines: