0

so here is my code, it creates the files which means it has access to write permissions but it just doesn't write anything inside, the function works and it prints on screen it just doesn't write anything, any idea why is that?

expected behavior: it to create two files and filter websites to working or non-working and state the error code for non-working ones

current behavior: it creates to empty files

import requests
from concurrent.futures import ThreadPoolExecutor
websites = []
f = open("websites.txt", "r")
for i in f:
    if not i.startswith("http"):
        i = "http://"+i
    websites.append(i.strip())
    print("appended" + i)
f.close()
with open("working.txt", "w") as fa, open("not_working.txt", "w") as fe:
    def checker(website):
        response = requests.get(website)
        available = response.status_code == 200
        print(response.status_code)
        if available:
            fa.write(website + "\n")
        else:
            fe.write(website + " error " + response.status_code + "\n")
    with ThreadPoolExecutor() as executor:
        executor.map(checker, websites)
bido whab
  • 32
  • 8
  • Maybe this can help you https://stackoverflow.com/questions/11983938/python-appending-to-same-file-from-multiple-threads – Gonzalo Odiard May 01 '22 at 00:37
  • writing in treads is not good idea because it can make conflict which data should be write first. You should rather send data to main thread and write all (one-by-one) in this thread. You can use Queue to send data to main thread. Or maybe it can run as `results = executor.map(...).get()` and later write all values from `results` - at least you can do something like this with [multiprocessing.ThreadPool](https://docs.python.org/3/library/multiprocessing.html) – furas May 01 '22 at 01:02
  • when I run your code then I get only `.result_iterator at 0x7f8c59e82200>` - so it doesn't execute functions and it needs `for result in executor.map(checker, websites): ... do something with result ...` or at least `list( executor.map(checker, websites) )` and then it write in file – furas May 01 '22 at 01:11

1 Answers1

1

Code

executor.map(checker, websites)

creates generator but it doesn't execute threads.

It needs at least list()

list( executor.map(checker, websites) )

to execute generator.


Full code:

import requests
from concurrent.futures import ThreadPoolExecutor

# --- functions ---

def read_urls(filename):
    websites = []

    with open(filename) as f:
        for line in f:
            line = line.strip()
            if line:  # skip empty lines
                if not line.startswith("http"):
                    line = "http://" + line
                websites.append(line)
                print("appended:", line)

    return websites
    
def checker(website):
    response = requests.get(website)

    print('[checker]:', response.status_code, website)

    if response.status_code == 200:
        fa.write(f'{website}\n')
    else:
        fe.write(f'{website} error {response.status_code}\n')

# --- main --- 

#websites = read_urls("websites.txt")
websites = ['https://stackoverflow.com', 'https://fake.com']

with open("working.txt", "w") as fa, open("not_working.txt", "w") as fe:
    with ThreadPoolExecutor() as executor:

        list( executor.map(checker, websites) )

But it may be safer to return result from function and write in main thread. And it will write results in the same order as data in original file. Writing in thread may get it in different order because threads may run in different order.

import requests
from concurrent.futures import ThreadPoolExecutor

# --- functions ---

def read_urls(filename):
    websites = []

    with open(filename) as f:
        for line in f:
            line = line.strip()
            if line:  # skip empty lines
                if not line.startswith("http"):
                    line = "http://" + line
                websites.append(line)
                print("appended:", line)

    return websites

def checker(website):
    response = requests.get(website)

    print('[checker]:', response.status_code, website)

    return website, response.status_code

# --- main --- 

#websites = read_urls("websites.txt")
websites = ['https://stackoverflow.com', 'https://fake.com']

with open("working.txt", "w") as fa, open("not_working.txt", "w") as fe:
    with ThreadPoolExecutor() as executor:

        for website, status_code in executor.map(checker, websites):

            print('[main]:', status_code, website)

            if status_code == 200:
                 fa.write(f'{website}\n')
            else:
                 fe.write(f'{website} error {status_code}\n')
furas
  • 134,197
  • 12
  • 106
  • 148