2

This shows up in Process Explorer where I am running Python scripts related to the Collatz conjecture. To avoid distraction by the complexity of these scripts, I have created this minimal script to demonstrate the problem:-

import time
import psutil
import random

p = psutil.Process()
p.cpu_affinity([4]) # Lock to a CPU to eliminate this as a cause of variation.
print('cpu affinity', p.cpu_affinity())
random.seed(123456789)
bits = 7091330
n = random.getrandbits(bits) # Generate a very large number.
print(bits, 'bits')
t0= time.perf_counter()
for i in range(100000):
   m  = 3 * n # Do something with the very large number.
t1= time.perf_counter()
print('ran in %f seconds' % (t1 - t0))
print(p.cpu_times())
mi = p.memory_info() # Get page fault statistics.
for field in mi._fields:
    print(field, getattr(mi, field))

When this is run, it sometimes gets a lot of page faults and takes longer, sometimes not:-

cpu affinity [4]
7091330 bits
ran in 43.947864 seconds
pcputimes(user=41.5, system=0.21875, children_user=0.0, children_system=0.0)
rss 27992064
vms 20475904
num_page_faults 7421
peak_wset 27996160
wset 27992064
peak_paged_pool 200808
paged_pool 200632
peak_nonpaged_pool 19296
nonpaged_pool 18944
pagefile 20475904
peak_pagefile 20475904
private 20475904

cpu affinity [4]
7091330 bits
ran in 48.507077 seconds
pcputimes(user=42.03125, system=4.625, children_user=0.0, children_system=0.0)
rss 27144192
vms 20529152
num_page_faults 3034148
peak_wset 28934144
wset 27144192
peak_paged_pool 200808
paged_pool 200632
peak_nonpaged_pool 19296
nonpaged_pool 18944
pagefile 20529152
peak_pagefile 21413888
private 20529152

I am running this with Python 3.4.2 under Windows 10 on an i7-4790 CPU with 16GB of RAM. Changing the process priority has no effect. Changing the constant 3 in the for loop to another small number has no effect. Doubling the size of the number n approximately doubles the number of page faults. In each iteration of the for loop, the number of page faults increases either by zero or by about 1/30800 of the number of bits in n.
I have discovered that page faults in the loop can be eliminated if the "Do something with the very large number" is confined to in-place operations like *=, //=, etc. It may also be necessary to use the gmpy2 xmpz multiple-precision integer type for the very large number.

user258279
  • 371
  • 4
  • 12
  • Well, these "longs" that you've specified are roughly a Meg each. They aren't small at all. Swap and Page depend on other things going on in the system. Plus Windows runs all kinds of stuff on its own without asking, so my first question is how quiet is the system? The thing that strikes me about your numbers is that the execution time really isn't that much more despite the large number of page faults. So clearly, the OS was picking your data back up on of buffers still in memory that had not actually been released yet. You didn't go to the disk to get the page. – Frank Merrow Feb 27 '20 at 06:26
  • Swap and Paging are particularly nasty beasts when you dig under the hood. Understanding how they work can be a challenge. My gut feel is that something else is going on in the system. Particularly if you are in a corporate environment with all the IT "protections" in place . . . it is hard to know what is really happening in the system. Perhaps you could get some insight by firing up Task Manager and look at "Performance" . . . specifically the "Memory" tab. I am guessing something is running on your system, probably without you knowing. – Frank Merrow Feb 27 '20 at 06:30
  • @Frank Merrow. It's on a home PC with a lot of other stuff running. But I ran it on a different Windows 7 PC with most apps shut down and got similar results. Although the large numbers are about 1 MB each, this is small compared to the 27 MB Resident Set Size of the process and the 16 GB of physical memory on the PC. – user258279 Feb 27 '20 at 23:39
  • It may be relevant that the RSS is consistently 1 or 2 MB greater when there are not many page faults. – user258279 Feb 27 '20 at 23:56

0 Answers0