So you're telling me that the guest OS uses a significantly different amount of memory based on what hypervisor it's running under, given an identical workload? I don't think I buy that...
One obvious problem I see is that Internet Explorer is obviously a very heavily used application on your terminal server, however, you're using a mix of 32-bit and 64-bit instances of Internet Explorer. The problem with this is that the copy-on-read/write memory and other shared memory techniques that usually benefit a terminal server when multiple sessions are launching the same application, is that they lose those optimization techniques; they cannot be shared among 32 and 64 bit versions. If you standardized all your users on either 32 or 64 bit Internet Explorer, your overall memory usage would be less.
Running an Application
After user logon, the desktop (or application
if in single-application mode) is displayed for the user. When the
user selects a 32-bit application to run, the mouse commands are
passed to the Terminal Server, which launches the selected application
into a new virtual memory space (2-GB application, 2-GB kernel). All
processes on the Terminal Server will share code in kernel and user
modes wherever possible. To achieve the sharing of code between
processes, the Windows NT Virtual Memory (VM) manager uses
copy-on-write page protection. When multiple processes want to read
and write the same memory contents, the VM manager will assign
copy-on-write page protection to the memory region. The processes
(Sessions) will use the same memory contents until a write operation
is performed, at which time the VM manager will copy the physical page
frame to another location, update the process's virtual address to
point to the new page location and now mark the page as read/write.
Copy-on-write is extremely useful and efficient for applications
running on a Terminal Server.
When a Win32-based application such as Microsoft Word is loaded into
physical memory by one process (Session) it is marked as
copy-on-write. When new processes (Sessions) also invoke Word, the
image loader will just point the new processes (Sessions) to the
existing copy because the application is already loaded in memory.
When buffers and user-specific data is required (for example, saving
to a file), the necessary pages will be copied into a new physical
memory location and marked as read/write for the individual process
(Session). The VM manager will protect this memory space from other
processes. Most of an application, however, is shareable code and will
only have a single instance of code in physical memory no matter how
many times it is run.
> It is preferable (although not necessary) to run 32-bit applications
in a Terminal Server environment. The 32-bit applications (Win32) will
allow sharing of code and run more efficiently in multi-user sessions.
Windows NT allows 16-bit applications (Win16) to run in a Win32
environment by creating a virtual MS-DOS-based computer (VDM) for each
Win16 application to execute. All 16-bit output is translated into
Win32 calls, which perform the necessary actions. Because Win16 apps
are executing within their own VDM, code cannot be shared between
applications in multiple sessions. Translation between Win16 and Win32
calls also consumes system resources. Running Win16 applications in a
Terminal Server environment can potentially consume twice the
resources than a comparable Win32-based application will.