1

I am trying to access NOAA FTP server to download multiple datasets. There are 365 files per year for daily data, manual downloading is little cumbersome. I tried to use ftplib, but got:

gaierror: [Errno 11001] getaddrinfo failed

Below is my code snippet:

from ftplib import FTP
ftp = FTP("https://gml.noaa.gov/aftp/data/radiation/surfrad/Boulder_CO/2020/")
ftp.login()

# Get all files
files = ftp.nlst()

# Print out the files:
for file in files:
    print("Downloading..." + file)
    ftp.retrbinary("RETR" + file, open("..../NOAA/surfrad/Boulder_CO/2020/" + file, 'wb').write)
ftp.close()

Any help on this one would be grateful. Also I tried to ping the server, and it only return signal when using:

ping gml.noaa.gov

When I tried to ping on full ftp link:

ping https://gml.noaa.gov/aftp/data/radiation/surfrad/Boulder_CO/2020

it doesn't. Not sure why is that.

The full traceback is:

---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
<ipython-input-102-ea6ae149ac16> in <module>
      1 start = datetime.now()
----> 2 ftp = FTP("ftp://aftp.cmdl.noaa.gov/data/radiation/surfrad/Boulder_CO/2020")
      3 # ftp.login('your-username', 'your-passwor')
      4 ftp.login()
      5 

c:\users\smnge\anaconda3\envs\dlgpu\lib\ftplib.py in __init__(self, host, user, passwd, acct, timeout, source_address)
    115         self.timeout = timeout
    116         if host:
--> 117             self.connect(host)
    118             if user:
    119                 self.login(user, passwd, acct)

c:\users\smnge\anaconda3\envs\dlgpu\lib\ftplib.py in connect(self, host, port, timeout, source_address)
    150             self.source_address = source_address
    151         self.sock = socket.create_connection((self.host, self.port), self.timeout,
--> 152                                              source_address=self.source_address)
    153         self.af = self.sock.family
    154         self.file = self.sock.makefile('r', encoding=self.encoding)

c:\users\smnge\anaconda3\envs\dlgpu\lib\socket.py in create_connection(address, timeout, source_address)
    705     host, port = address
    706     err = None
--> 707     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    708         af, socktype, proto, canonname, sa = res
    709         sock = None

c:\users\smnge\anaconda3\envs\dlgpu\lib\socket.py in getaddrinfo(host, port, family, type, proto, flags)
    750     # and socket type values to enum constants.
    751     addrlist = []
--> 752     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    753         af, socktype, proto, canonname, sa = res
    754         addrlist.append((_intenum_converter(af, AddressFamily),

gaierror: [Errno 11001] getaddrinfo failed
Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
stackword_0
  • 185
  • 8
  • 2
    `FTP("https` that https doesn't look right. And doesn't match your stacktrace – njzk2 May 27 '21 at 21:07
  • 1
    No need to close, the code allows to reproduce just fine and the information is complete - however, @SGotham shouldn't post trackbacks as images in future posts, but instead post the traceback as text as well. If you would be so kind to update the question, it will improve the quality and usefulness to others. – Grismar May 27 '21 at 21:16
  • @njzk2, Thanks for the quick reply. Your suggestion did help to get access. Now I got another problem of writing into local disk. I couldn't post full error here, do you have any suggestion, how can copy all the files into local disk. – stackword_0 May 27 '21 at 21:26
  • @Grismar, thanks for suggestion, I have updated the traceback to text. – stackword_0 May 27 '21 at 21:27
  • 1
    Thanks - if the answer below is sufficient, please accept it with the checkmark, so the question no longer appears as unanswered. – Grismar May 27 '21 at 22:13
  • 1
    relevant documentation: https://docs.python.org/3/library/ftplib.html First parameter is the host, not the full uri. – njzk2 May 28 '21 at 22:11

1 Answers1

4

The link you posted was a website link, not an FTP link.

However, this would work at the start of your script:

from ftplib import FTP
ftp = FTP("ftp.gml.noaa.gov")
ftp.login()
ftp.cwd('data/radiation/surfrad/Boulder_CO/2020')

# Get all files
files = ftp.nlst()

# etc ...

Note that the https:// is gone, ftp. has been added to the start of the domain and the path is changed with a separate command, missing the aftp/ root.

The https:// was simply a mistake, it clearly indicates the URI as being a website URL, to be retrieved using HTTPS.

The ftp. at the start of the domain was just a guess, but it's a very common convention to host an FTP server at ftp.example.com, just like you'd use to see www.example.com for websites (and still do).

Removing the aftp/ was another guess, after the site didn't allow changing into that folder, but since the URL was a website, it made sense to assume the aftp folder was really just the root for anonymous FTP, which is what you are doing - logging in without credentials.

A working solution:

from ftplib import FTP
from pathlib import Path

ftp = FTP("ftp.gml.noaa.gov")
ftp.login()
ftp.cwd('data/radiation/surfrad/Boulder_CO/2020')

# Get all files
files = ftp.nlst()

# Download all the files to C:\Temp
for file in files:
    print("Downloading..." + file)
    ftp.retrbinary(f'RETR {file}', open(str(Path(r'C:\Temp') / file), 'wb').write)
ftp.close()

Or, if you don't like the complication of pathlib:

    ftp.retrbinary(f'RETR {file}', open(rf'C:\Temp\{file}', 'wb').write)
Grismar
  • 27,561
  • 4
  • 31
  • 54