1

When I execute python code which loops over an array of size 10^8. The pc becomes unresponsive and takes around 10 minutes to execute the code. After it is done with the script, it stays laggy for a while.

So is that a problem due to a weak processor or is it a Ram problem, and is the only way to fix it, is to upgrade the ram?

The script could be as simple as:

arr = [x for x in range(pow(10,8))]
for i in range( len(arr) ):
   arr[i]+=1

My Specs are: RAM is 8 GB. OS is Ubuntu. Python 3.6. Processor: Intel Core i7-3632QM 2.20GHZ

More details on what exactly happens: when I run the script I can see the memory usage for the python process keeps getting up. Then the PC becomes unresponsive. It doesn't respond to any action I give. If anything was playing in the background it stops, If i move the mouse, the cursor won't move. Until the script is actually done. Then it becomes responsive but very laggy for a while. if I try to switch the active application to another minimized application it takes quiet some time. As if the PC was just booted up. It takes a bit of time for everything to get back to normal.

omargamal8
  • 551
  • 5
  • 12

2 Answers2

1

What's happening here is very likely paging / swapping1. Due to virtual address space, each process on your system can address a huge amount of memory - way more than you have physically in your computer. If all processes together use more memory than you have physically available, the operating system is in trouble - one approach is paging: Moving data from some processes from memory to disk.

Since your disk, even if it's an SSD, is several orders of magnitude slower than RAM, the systems gets unresponsive. Say for example the OS decides to move the block of memory which contains your mouse cursor position onto the disk. Every time it updates the cursor, this introduces a huge delay. Even after the process which consumed all the memory finishes, it will take some time to load back all data from disk to RAM.

To illustrate, on my system with a comparable processor (i5-3320M), your example code finishes in a mere 20 seconds without impact on overall system responsiveness - that is because I have 16 GiB RAM. So clearly it is not about the "CPU [being saturated] with billions of operations". Given you have a quad-core processor, and that code uses only one thread, you have lots of spare compute cycles. Even if you were to use up all the CPU cycles, the system is usually quite responsive, because the OS scheduler does a good job balancing CPU cycles between your compute task and the process moving your mouse cursor.

Python is particularly prone to this issue, because it uses way more memory than necessary. Python 3.6.1 on my system uses ~4 GiB for the data in arr - even though 10^8 64 bit integers would only use 800 MB. That's just due to the fact that everything in python is an object. You can be more memory-efficient if you don't permanently store anything in memory in the first place or use numpy. But to discuss that, would require a more problem-oriented code example.

1: There are differences between paging and swapping, but nowadays it is used interchangeably.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • Yes, this makes a lot of sense. I understand that this code is far from being efficient in any sense. It was just addressing the problem. – omargamal8 Jun 06 '17 at 20:34
-1

The short answer is, your application becomes unresponsive because you've totally saturated your CPU with billions of operations being performed on a large dataset. While your program is stuck in those loops it can't do anything else and appears to lock up.

First, you're creating 100 million items using range(). This operation alone isn't going to be very fast because that's a lot of items.

Next, you're looping over those 100 million items with a list comprehension and building an entirely new list. The comprehension seems pointless since you're just passing the value from range right through, but perhaps you're just simplifying it for the example.

Finally, you're using a for loop to once again loop over all those items in the newly generated list from the comprehension.

That's 3 loops and 3 lists; One for the range(), another for the list comprehension and the third is the for loop. You're doing a lot of work creating huge lists multiple times.

The process of appending items to a list takes several operations and at 100 million items, that's 100Mhz * number-of-operations. For example, if it took 10 operations you're looking at about 1Ghz worth of processing time. These aren't real benchmarks, but they illustrate how doing a lot of little operations like this can quickly add up to a lot of CPU time. Not only that but copying at-minimum 100MB of data around in memory several times is going to take additional time as well. All of this leads to a lack of responsiveness because your CPU is totally saturated.

If you absolutely need to pre-build such a huge list, then make sure you only loop over it once and do all the work you need to do on that item at that time. That'll cut down on the number of times you recreate the list and save on memory since fewer lists will need to be stored in memory at the same time.

If all you really need is an incrementing number, you can use a generator to count up. Generators are far more efficient since they are "lazy"; They only "yield" a single value at a time, rather than returning a whole list at once . In Python 2, xrange() is a range generator that works exactly like range except that it yields a single value at a time, rather than creating a whole list at once and returning that.

for i in xrange(pow(10,8)):
    # do some work with the current value of i

In Python 3, there is no xrange() since the range() function returns a generator by default (technically it's range type, but it acts generally the same way).

Here's an explanation of the difference between range() and xrange()

http://pythoncentral.io/how-to-use-pythons-xrange-and-range/

Lastly, if you really need to use huge lists like this, the Numpy library has all sorts of optimizations for "sparse lists" which act like regular lists but do some clever tricks to store seemingly millions of items efficiently.

Soviut
  • 88,194
  • 49
  • 192
  • 260
  • 4
    `First, you're creating 100 million items using range()` No, at least in Python 3, this is just not the case – cat Jun 04 '17 at 23:44
  • @cat mind explaining what you mean? "that's not the case" doesn't clarify anything. – Soviut Jun 04 '17 at 23:51
  • 1
    So does `range()` in python 3 return a generator? – Soviut Jun 04 '17 at 23:52
  • @Soviut - yes! That's why I preface pretty much all my code expected to run on Python 2 and 3 with `try: range = xrange except NameError: pass` and use `range` everywhere. – zwer Jun 04 '17 at 23:54
  • In python 2 it does create a list. I'll update my answer. – Soviut Jun 04 '17 at 23:54
  • Also, while you might be simplifying it, processing is never measured in clock speed - there are couple of MHz clocked FPGA arrays that will, for certain operations, wipe the floor with general CPUs that run at GHz speeds. – zwer Jun 04 '17 at 23:58
  • @zwer It's just an example to illustrate how seemingly simple lines of code can quickly saturate a CPU when acting on large data sets. – Soviut Jun 05 '17 at 00:00
  • 1
    Python 3 range() doesn't return a generator, but an instance of the custom `range` type. It's lazy, though, in that it doesn't materialize the whole list, so people often get confused. – DSM Jun 05 '17 at 00:02
  • @Soviut First, thank you for the answer. I think all the answers I've seen here were all distracted by the sample code I wrote and that wasn't the purpose of the code. I want to clarify that this isn't the actual code that was causing the problem (although if I ran that code on my pc it would still freeze). My concern isn't about the time or about the optimizations that could be done, it is about the pc being totally unresponsive\freezes when I loop over arrays with large sizes (10^8 +), is that a ram problem or a cpu one? – omargamal8 Jun 05 '17 at 00:30
  • @omargamal8 I was explaining that it's unresponsive because you've saturated your CPU with billions of operations. Responsiveness is always tied to the CPU since anything to do with the RAM has to be carried out by the CPU at some point anyways. – Soviut Jun 05 '17 at 00:37
  • 1
    Any contemporary system can stay responsive under full CPU load due to preemptive multitasking. Hence this answer is not productive. – Zulan Jun 05 '17 at 06:49
  • @Zulan Yeah I was speaking more from the application itself becoming unresponsive. We don't know the OP's original specs. – Soviut Jun 05 '17 at 16:53