0

I can't wrap my head around how I could possibly rewrite my code to be multi-threaded.

The code I'm writing is made to automatically archive every single article in a list of newsgroups that exist, but I wanna be able to utilize my newsgroup plan and make it up to 20 threads. I've never coded threading before and my attempts were in vein.

Here's my code, excluding the username and pass ( but you can get a free account with max 5 threads if you really want to at https://my.xsusenet.com )

Please don't judge me too hard :(


import nntplib
import sys
import datetime
import os
basetime = datetime.datetime.today()
#daysback = int(sys.argv[1])
#date_list = [basetime - datetime.timedelta(days=x) for x in range(daysback)]
s = nntplib.NNTP('free.xsusenet.com', user='USERNAME', password='PASSWORD') # I am only allowed 5 connections at a time, so try for 4.
groups = []
resp, groups_list_tuple = s.list()


def remove_non_ascii_2(string):
    return string.encode('ascii', errors='ignore').decode()


for g_tuple in groups_list_tuple:
    #print(g_tuple) # DEBUG_LINE
    # Parse group_list info
    group = g_tuple[0]
    last = g_tuple[1]
    first = g_tuple[2]
    flag = g_tuple[3]

    # Parse newsgroup info
    resp, count, first, last, name = s.group(group)
    for message_id in range(first, last):
        resp, number, mes_id = s.next()
        resp, info = s.article(mes_id)
        if os.path.exists('.\\' + group):
            pass
        else:
            os.mkdir('.\\' + group)
        print(f"Downloading: {message_id}")
        outfile = open('.\\' + group + '\\' + str(message_id), 'a', encoding="utf-8")
        for line in info.lines:
            outfile.write(remove_non_ascii_2(str(line)) + '\n')
        outfile.close()

Tried threading using a ThreadPoolExecutor, to cause it to use 20 threads, and failed, caused it to repeat the same process to the same message id. The expected result was to download 20 different messages at a time.

Here's the code I tried with threading, mind you I did like 6-8 variations of it to try and get it to work, this was the last one before I gave up to ask on here.

import nntplib
import sys
import datetime
import os
import concurrent.futures
basetime = datetime.datetime.today()
#daysback = int(sys.argv[1])
#date_list = [basetime - datetime.timedelta(days=x) for x in range(daysback)]
s = nntplib.NNTP('free.xsusenet.com', user='USERNAME', password='PASSWORD') # I am only allowed 5 connections at a time, so try for 4.
groups = []
resp, groups_list_tuple = s.list()


def remove_non_ascii_2(string):
    return string.encode('ascii', errors='ignore').decode()

def download_nntp_file(mess_id):
    resp, count, first, last, name = s.group(group)
    message_id = range(first, last)

    resp, number, mes_id = s.next()
    resp, info = s.article(mes_id)
    if os.path.exists('.\\' + group):
        pass
    else:
        os.mkdir('.\\' + group)
    print(f"Downloading: {mess_id}")
    outfile = open('.\\' + group + '\\' + str(mess_id), 'a', encoding="utf-8")
    for line in info.lines:
        outfile.write(remove_non_ascii_2(str(line)) + '\n')
    outfile.close()


for g_tuple in groups_list_tuple:
    #print(g_tuple) # DEBUG_LINE
    # Parse group_list info
    group = g_tuple[0]
    last = g_tuple[1]
    first = g_tuple[2]
    flag = g_tuple[3]

    # Parse newsgroup info
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        futures = executor.submit(download_nntp_file)
ANK
  • 15
  • 3
  • 1
    if you tried `ThreadPoolExecutor` then show this code - and we will show you where is mistake. – furas Oct 26 '22 at 00:38
  • Added the code for ya. – ANK Oct 26 '22 at 01:59
  • yoi should send values as parameters `.submit(download_nntp_file, g_tuple)` And maybe you should run `for`-loop inside `with ... as executor:` OR maybe you shold use `map(download_nntp_file, groups_list_tuple)` instead of `for g_tuple in groups_list_tuple: submit(download_nntp_file, g_tuple)` – furas Oct 26 '22 at 09:48
  • Alright, I'll give it a shot when I get back. I appreciate it man, and I'll let ya know if that works out. – ANK Oct 26 '22 at 21:41

1 Answers1

2

I can't test it with XSUseNet.

I wouldn't use global variables because when processes work at the same time then they may get the same values from these variables.

You should rather send values as parameters to functions.

Something like this:

def download_nntp_file(g_tuple):
    # ... code which uses `g_tuple` instead of global variables ...

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    for g_tuple in groups_list_tuple:
        executor.submit(download_nntp_file, g_tuple)

But I would be simpler to use map() instead of submit() because it gets list with arguments and it doesn't need for-loop

def download_nntp_file(g_tuple):
    # ... code which uses `g_tuple` instead of global variables ...

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    executor.map(download_nntp_file, groups_list_tuple)
furas
  • 134,197
  • 12
  • 106
  • 148
  • Thanks for the suggestion, unfortunately in this case the NNTP library actually calls for a message ID rather than the g_tuple. Lemme see if I can find the proper documentation for ya so you can kinda understand it a bit better. Iterating over the first and last possible messages in the newsgroup is the idea, and for every single one of them, it'll have it's own article identifier. https://docs.python.org/3/library/nntplib.html#nntplib.NNTP.article It's a bit confusing to even read that, sorry about that. I will take what you said in though, I'll try some stuff real quick. – ANK Oct 27 '22 at 00:58
  • if you need `message ID` then you should use `submit(..., message_id)` – furas Oct 27 '22 at 01:01
  • and if you need to use other value then you should also send them in `submit(..., message_id, other_value, and_another_value)` – furas Oct 27 '22 at 01:03
  • I did manage to get it to work with your advice man, I appreciate it! – ANK Oct 27 '22 at 01:47
  • If you have working code then you could put it (with detailed description) as answer - it can be useful for other users. – furas Oct 27 '22 at 12:40