0

How I can clear memory in this Python loop?

import concurrent.futures as futures
with futures.ThreadPoolExecutor(max_workers=100) as executor:
    fs = [executor.submit(get_data, url) for url in link]
    for i, f in enumerate(futures.as_completed(fs)):
        x= (f.result())
        results.append(x)
        del x 
        del f

get_data - simple function which uses requests

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
  • Why do you want to delete them? What's the problem you're facing? – dinos66 Jul 30 '15 at 10:34
  • i am trying send >100k requests. per five thousand requests using 1 gb memory. – Kostiantyn Palianychka Jul 30 '15 at 10:36
  • What I've done is split the original grand list into smaller batches and then run the above loop for each of these batches. I think that five thousand requests, especially if a few are pinging the same site/server at the same time, could get you in trouble (i.e. blocked). I faced that when trying to unshorten urls. – dinos66 Jul 30 '15 at 10:45
  • 1
    So, I'm guessing you didn't like any of our answers? – dinos66 Jul 31 '15 at 10:41

2 Answers2

1

My solution would be as such:
import concurrent.futures as futures

#split the original grand list into smaller batches  

batchurlList = [grandUrlList[x:x+batchSize] for x in range(0, len(grandUrlList), batchSize)]
for tmpurlList in batchurlList:
    with futures.ThreadPoolExecutor(max_workers=100) as executor:
        myfuture = {executor.submit(myFunction, url): url for url in tmpurlList}
        for future in futures.as_completed(myfuture, timeout=60):
            originalUrl = myfuture[future]
            results.append(future.result())
dinos66
  • 686
  • 6
  • 15
0

I think I had the same problem recently, The answer is not del but introducing a sleep func... Try;

import time
import concurrent.futures as futures
with futures.ThreadPoolExecutor(max_workers=100) as executor:
    fs = [executor.submit(get_data, url) for url in link]
    for i, f in enumerate(futures.as_completed(fs)):
           x= (f.result())
           results.append(x)
           time.sleep(n_seconds)

Or something like this (I used a while loop over a list of urls)

Nikki101
  • 69
  • 2
  • 8
  • If you have 100 urls, I guess that's ok. What happens when you have millions? Is there an optimal n_seconds you can use? – dinos66 Jul 30 '15 at 10:35
  • That's true. I may have made some assumptions without enough information. However introducing a time delay works when there is a conflict Error having to do with the number of requests one can make in a given time period. – Nikki101 Jul 30 '15 at 10:48