0

So basically I have a list of 300 values and different averages associated with each one.

I have a for-loop that generates a list of ten of these values at random, and writes it to excel if certain conditions are met based on their averages.

The code runs fine if I loop through 10 million times or less, but that is orders of magnitudes too small. Even if I just double the for loop counter to 20 million my computer becomes unusable while it is running.

I want to iterate the loop 100 million or 1 billion times even. I want it to run slowly in the background, I don't care if it takes 24 hours to get to the results. I just want to use my computer while it's working. Currently, if the for loop goes past 10 million the memory and disk usage of my laptop go to 99%.

Using pyScripter and python 3.3

Comp specs: Intel Core i7 4700HQ (2.40GHz) 8GB Memory 1TB HDD NVIDIA GeForce GTX 850M 2GB GDDR3

Code snippet:

for i in range( 0, cycles ):
    genRandLineups( Red );                     #random team gens
    genRandLineups( Blue );
    genRandLineups( Purple );
    genRandLineups( Green );

    if          sum( teamAve[i] )    <= 600
        and ( ( sum( teamValues[i] ) >  currentHighScore )
            or  sum( teamValues[i] ) >  1024 
            ):
        teamValuesF.append( teamValues[i] )


        sheetw.write( q, 0, str( teamValues[i] ) )
        ts = time.time()
        workbookw.save( "Data_Log.xls" )
        st = datetime.datetime.fromtimestamp( ts ).strftime( '%Y-%m-%d %H:%M:%S' )
        sheetw.write( q, 3, st )
        q = q + 1

        if sum( teamValues[i] ) > currentHighScore:
            currentHighScore = sum( teamValues[i] )
user3666197
  • 1
  • 6
  • 50
  • 92
ACD
  • 151
  • 2
  • 6
  • Buy a laptop with 2 or more cores? And what OS are you using? – deets Nov 07 '14 at 19:55
  • What does your code look like? What platform are you on? – Gareth Latty Nov 07 '14 at 19:56
  • Are you computing all of the 10-20 million items at once, up front? Or are you saving the results of all of the 10-20 million averages all the way to the end? If you don't need to do that, don't! Use generators or overwrite variables so that your memory usage remains steady regardless of the length of your loop. Without seeing any code, that's about as close to an answer as you're likely to get. – Blckknght Nov 07 '14 at 20:02
  • First, I suspect your _real_ problem is that you're just retaining too much memory, causing your computer to run into VM swap, which makes your entire computer slow to a crawl. You should really look into fixing that instead of just trying to make it happen periodically throughout the day instead of constantly. For example, it sounds like you're keeping a list of 10N values around forever. Do you really need to do that? If not, start freeing them. If so, look into some efficient disk-based storage instead of memory, or something more compact like a NumPy array instead of a list. – abarnert Nov 07 '14 at 20:08
  • 1
    But meanwhile, if you want to reduce the priority of a program, the easiest way to do that may be externally. For example, on most platforms besides Windows, you can just launch your script with `nice 20 python myscript.py` and the OS will give everything else more CPU time than your program. – abarnert Nov 07 '14 at 20:10
  • @Blckknght I only need to save values that meet a certain criteria, which is about 30 values after 10 million loops. I've never optimized code before. Any quick tips? – ACD Nov 07 '14 at 20:12
  • @Lattyware Added code to main body – ACD Nov 07 '14 at 20:17
  • How are you reading those values (`teamAve`, `teamValues`, etc.) in? Are you sure that your real problem isn't there, and if `cycles` is huge, it's burning up 99% CPU for a long time before even starting this loop? – abarnert Nov 07 '14 at 20:20
  • Also, are functions like `GenRandLineups` constant-time, or do they depend on the size of `cycles` (or `teamValues` or whatever)? – abarnert Nov 07 '14 at 20:21
  • After the second update, what is `teamValuesF` used for? It looks like a huge list that'll be some percentage of your original `teamValues` list that you never use anywhere. So, why store it? – abarnert Nov 07 '14 at 20:24
  • @abarnert No, GenRandLineups just picks random items from a list. Does not depend on the size of cycles. I don't understand your first question – ACD Nov 07 '14 at 20:25
  • @abarnert teamValuesF is a list containing only the entries that meet my criteria in the for loop – ACD Nov 07 '14 at 20:26
  • @ACD: OK, the first question is the most important, so let's try again. How do you generate `teamValues`? Have you done any profiling to see if your code is actually taking forever inside this loop, rather than before you get to the loop? (Even the most trivial profiling: launch the script with a huge number, hit ^C a minute later, and look at the traceback.) – abarnert Nov 07 '14 at 20:39
  • @ACD: For the last one, why are you building a list that contains some of the entries if you don't need it for anything? – abarnert Nov 07 '14 at 20:39
  • @abarnert How do I profile in pyscripter? But they are made outside of the loop. Just reads an excel column and puts it into a list. – ACD Nov 07 '14 at 21:07
  • @ACD: PyScripter is a smart IDE-frontEnd ( ref. it's internal configuration / selection of internal/external python code-execution engine(s) ), not the tool to use in this case, you may **clock** / **profile** the processing-time spent / memory-requirements in each relevant sub-section of your code. Just add a few SLOC's in the code and you may get timing precise down to about **30 nsec** – user3666197 Nov 07 '14 at 21:20
  • @ACD: Again, you can even do the most trivial thing possible: interrupt the program after a minute and see how far it's gotten by looking at the traceback. I don't know PyScripter, so I don't know if you interrupt it with ^C or a menu item or whatever, but there must be a way. Or just run the script on the command line instead. – abarnert Nov 07 '14 at 21:30
  • @ACD: Meanwhile, I realize they're made outside the loop; the question is what the code looks like, how long it's taking, and how big they are, none of which you've answered. The fact that `i` ranges up to `cycles` and you're using `teamValues[i]` implies that if you iterate 100 million times, you're going to have a `teamValues` with 100 million elements in it. That's gigantic. That's 800MB just for the list, and if all the elements are distinct it's another 8GB minimum for those elements, which is already all of your RAM, just for that one list. – abarnert Nov 07 '14 at 21:33
  • @ACD: And finally, if these millions of values are being read from an Excel file just so you can loop over them, that's exactly the kind of thing you should fix by using an iterator instead of a list: Just read each row one by one, use it, and move on to the next one (or maybe batches of, say, 1000 at a time), instead of reading 100M rows and then looping over them. – abarnert Nov 07 '14 at 21:34

2 Answers2

3

First, I suspect your real problem is that you're just retaining too much memory, causing your computer to run into VM swap, which makes your entire computer slow to a crawl. You should really look into fixing that instead of just trying to make it happen periodically throughout the day instead of constantly.

In particular, it sounds like you're keeping a list of 10N values around forever. Do you really need to do that?

If not, start freeing them. (Or don't store them in the first place. One common problem a lot of people have is that they need 1 billion values, but only one at a time, once through a loop, and they're storing them in a list when they could be using an iterator. This is basically the generic version of the familiar readlines() problem.)

If so, look into some efficient disk-based storage instead of memory, or something more compact like a NumPy array instead of a list.


But meanwhile, if you want to reduce the priority of a program, the easiest way to do that may be externally. For example, on most platforms besides Windows, you can just launch your script with nice 20 python myscript.py and the OS will give everything else more CPU time than your program.


But to answer your direct question, if you want to slow down your script from inside, that's pretty easy to do: Just call sleep every so often. This asks the OS to suspend your program and not give you any resources until the specified number of seconds have expired. That may be only approximate rather than absolutely nothing for exactly N seconds, but it's close enough (and as good as you can do).

For example:

for i in range(reps):
    do_expensive_work():
    if i % 100 == 99:
        time.sleep(10)

If do_expensive_work takes 18ms, you'll burn CPU for 1.8 seconds then sleep for 10 and repeat. I doubt that's exactly the behavior you want (or that it takes 18ms), but you can tweak the numbers. Or, if the timing is variable, and you want the sleep percentage to be consistent, you can measure times and sleep every N seconds since the last sleep, instead of every N reps.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Okay, I think I want to do your first suggestion. I generate 10mil items but only need to keep them if certain conditions are met. Generally at the end of 10 mil cycles I only have 20 values I keep. How do I tell python not to keep the ones that don't meet my criteria? – ACD Nov 07 '14 at 20:20
  • @ACD: Without seeing all of the relevant code, it's hard to answer that. But it may be a matter of breaking your loop up into an inner loop over 10000000 and an outer loop over N/10000000, and just reassign an empty list to the list you were building (in other words, `spam = []` at the top of the outer loop, instead of outside both loops). – abarnert Nov 07 '14 at 20:23
  • So I generate these random combinations, and if they don't meet the criteria I want to throw it away, and try again with a new random list. How do I do that efficiently? – ACD Nov 07 '14 at 20:31
  • @ACD: From what I can tell from the code you posted, the only reason you're not throwing it away is that you're appending it to a list that you never use. If that's true, the answer is pretty simple: Just stop appending to that list. – abarnert Nov 07 '14 at 20:40
0

Do not slow down. Rather re-design & step in for HPC

For high performance processing on just "a few" items ( list of 300 values ) the best would consist of:

  1. avoid file access ( even if sparse as noted in OP ) -- cache TruePOSITIVEs in anOutputSTRING that is being fileIO-ed at the end or upon a string length limit or marshalled to another, remote, logging machine.

  2. move all highly iterative processing, said to span 10E+006 -- 10E+009 cycles, onto massively-parallel GPU/CUDA kernel-processing on GPU you already have in the laptop, both to free your CPU-resources and to get benefits of 640-threads delivering about 1.15 TFLOPs of a parallel-engine computing horsepower, as opposed to just a few, GUI-shared, MFLOPs from the CPU-cores.

user3666197
  • 1
  • 6
  • 50
  • 92