10

I am trying to use tqdm to report the progress of each file downloads from three links, I wanted to use multithreading to download simultaneously from each link at the same time update the progress bar. But when I execute my script, there are multiple lines of progress bar it seems the thread are updating the tqdm progress bar the same time. I am asking how should I run multithreading for downloading the files while maintaining progress bar for each download without duplicated bars filling the entire screen? Here is my code.

import os
import sys
import requests
from pathlib import Path
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor as PE


def get_filename(url):
    filename = os.path.basename(url)
    fname, extension = os.path.splitext(filename)
    if extension:
        return filename
    header = requests.head(url).headers
    if "Location" in header:
        return os.path.basename(header["Location"])
    return fname


def get_file_size(url):
    header = requests.head(url).headers
    if "Content-Length" in header and header["Content-Length"] != 0:
        return int(header["Content-Length"])
    elif "Location" in header and "status" not in header:
        redirect_link = header["Location"]
        r = requests.head(redirect_link).headers
        return int(r["Content-Length"])


def download_file(url, filename=None):
    # Download to the Downloads folder in user's home folder.
    download_dir = os.path.join(Path.home(), "Downloads")
    if not os.path.exists(download_dir):
        os.makedirs(download_dir, exist_ok=True)
    if not filename:
        filename = get_filename(url)
    file_size = get_file_size(url)
    abs_path = os.path.join(download_dir, filename)
    chunk_size = 1024
    with open(abs_path, "wb") as f, requests.get(url, stream=True) as r, tqdm(
            unit="B",
            unit_scale=True,
            unit_divisor=chunk_size,
            desc=filename,
            total=file_size,
            file=sys.stdout
    ) as progress:
        for chunk in r.iter_content(chunk_size=chunk_size):
            data = f.write(chunk)
            progress.update(data)


if __name__ == "__main__":
    urls = ["http://mirrors.evowise.com/linuxmint/stable/20/linuxmint-20-xfce-64bit.iso",
            "https://www.vmware.com/go/getworkstation-win",
            "https://download.geany.org/geany-1.36_setup.exe"]
    with PE(max_workers=len(urls)) as ex:
        ex.map(download_file, urls)

I modified my code a bit, which i took from Use tqdm with concurrent.futures?.

    def download_file(url, filename=None):
    # Download to the Downloads folder in user's home folder.
    download_dir = os.path.join(Path.home(), "Downloads")
    if not os.path.exists(download_dir):
        os.makedirs(download_dir, exist_ok=True)
    if not filename:
        filename = get_filename(url)
    # file_size = get_file_size(url)
    abs_path = os.path.join(download_dir, filename)
    chunk_size = 1024
    with open(abs_path, "wb") as f, requests.get(url, stream=True) as r:
        for chunk in r.iter_content(chunk_size=chunk_size):
            f.write(chunk)


if __name__ == "__main__":
    urls = ["http://mirrors.evowise.com/linuxmint/stable/20/linuxmint-20-xfce-64bit.iso",
            "https://www.vmware.com/go/getworkstation-win",
            "https://download.geany.org/geany-1.36_setup.exe"]
    with PE() as ex:
        for url in urls:
            tqdm(ex.submit(download_file, url),
                 total=get_file_size(url),
                 unit="B",
                 unit_scale=True,
                 unit_divisor=1024,
                 desc=get_filename(url),
                 file=sys.stdout)

But the bar is not updating after i modified my code...

My problem:
tqdm has duplicated progress bars

I have no problem with concurrent download, but has problem implementing tqdm to update individual progress for each link, below is what I want to achieve:
ideally should have progress bar for each download.

I used one of the solution:

if __name__ == "__main__":
urls = ["http://mirrors.evowise.com/linuxmint/stable/20/linuxmint-20-xfce-64bit.iso",
        "https://www.vmware.com/go/getworkstation-win",
        "https://download.geany.org/geany-1.36_setup.exe"]

with tqdm(total=len(urls)) as pbar:
    with ThreadPoolExecutor() as ex:
        futures = [ex.submit(download_file, url) for url in urls]
        for future in as_completed(futures):
            result = future.result()
            pbar.update(1)

But this is the result: enter image description here

ti7
  • 16,375
  • 6
  • 40
  • 68
skullknight
  • 109
  • 1
  • 1
  • 4
  • 1
    Does this answer your question? [Use tqdm with concurrent.futures?](https://stackoverflow.com/questions/51601756/use-tqdm-with-concurrent-futures) – Lescurel Sep 10 '20 at 09:15
  • I saw this but it does not apply to my situation. – skullknight Sep 10 '20 at 09:56
  • I would go with aiohttp instead of requests, make the download function async, wrap up everthing down to async and you should be done – geckos Sep 10 '20 at 12:20
  • Hi, I have never used aiohttp to download, do you have a real life sample on downloading files from multiple links with progress bar? – skullknight Sep 10 '20 at 15:21
  • nice copy pasting a random website: http://5.9.10.113/63826035/how-to-use-tqdm-with-multithreading into a stack question – alexzander Aug 10 '21 at 13:08

1 Answers1

15

This would be the general idea (format it as you wish):

from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import requests


def download_file(url):
    with requests.get(url, stream=True) as r:
        for chunk in r.iter_content(chunk_size=50000):
            pass
    return url


if __name__ == "__main__":
    urls = ["http://mirrors.evowise.com/linuxmint/stable/20/linuxmint-20-xfce-64bit.iso",
            "https://www.vmware.com/go/getworkstation-win",
            "https://download.geany.org/geany-1.36_setup.exe"]

    with tqdm(total=len(urls)) as pbar:
        with ThreadPoolExecutor(max_workers=len(urls)) as ex:
            futures = [ex.submit(download_file, url) for url in urls]
            for future in as_completed(futures):
                result = future.result()
                pbar.update(1)

Simulation If You Knew Lengths of Each Download

from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import requests
import time
import random


def download_file(url, pbar):
    for _ in range(30):
        time.sleep(.50 * random.random())
        pbar.update(1)
    return url


if __name__ == "__main__":
    urls = ["http://mirrors.evowise.com/linuxmint/stable/20/linuxmint-20-xfce-64bit.iso",
            "https://www.vmware.com/go/getworkstation-win",
            "https://download.geany.org/geany-1.36_setup.exe"]

    with tqdm(total=90) as pbar:
        with ThreadPoolExecutor(max_workers=3) as ex:
            futures = [ex.submit(download_file, url, pbar) for url in urls]
            for future in as_completed(futures):
                result = future.result()
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Hi thanks for the answer, but the result is the same when i use threadpoolexecutor or threading.Thread...It seems on each thread new progress bar will appear, i am not sure how to solve it... – skullknight Sep 10 '20 at 15:20
  • The above code produced one bar for me and advanced as each Future instance produced a result. Did you actually run my code? I mean, it clearly produces only one instance of a progress bar, `pbar`. – Booboo Sep 10 '20 at 15:25
  • Hi, yes, I coped and pasted your code under my if __name__ == "__main__": I got multiple progress bar.. – skullknight Sep 10 '20 at 15:30
  • You are probably getting multiple progress bars because your `download_file` probably still has a progress bar in it. I have updated my answer with complete code (it doesn't actually wirite out the downloaded files). Note that you also need to set `max_workers` when you construct your `ThreadPoolExecutor` instance so that you have enough threads. – Booboo Sep 10 '20 at 16:01
  • If you knew the total size of the three downloads in terms of 50,000 byte chunks, you could set the length of the progress bar to be that number of chunks and then pass the progress bar as a second argument to `download_file`. Then as each chunk is read, `download_file` could call `pbar.update(1)`. In that way, you would have a better, smoother progress bar. But, of course, you probably do not know what the total size is in advance. – Booboo Sep 10 '20 at 16:07
  • In the end, the solution referred to by @Lescurel was not all that different from mine other than using `map` rather than `submit` with `as_completed` the difference being that `map` is forced to return results in the order in which the requests (i.e. URLs) were submitted, which is not necessarily a reflection on how the work is actually proceeding. If the first URL is the last one completed, the progress bar won't advance at all until all the downloads complete. – Booboo Sep 10 '20 at 16:52
  • And what about having multiple bars for different kind of tasks? You can't used as_complete – David Davó Jun 10 '23 at 11:41
  • I am having the same problem. Multiple progress bars kept popping up when tqdm is used with concurrent.futures. Note that I am using spyder IDE. – user1769197 Jul 08 '23 at 15:14
  • @user1769197 Post your code as a question on this site and them let me know its link. – Booboo Jul 08 '23 at 15:31
  • @Booboo, posted here. https://stackoverflow.com/questions/76643837/python-tqdm-duplicated-progress-bars-with-nested-loops-in-spyder – user1769197 Jul 08 '23 at 16:08