In Google Colaboratory, is there a read_csv problem with URL redirections?

Question

Opening the following CSV file with pandas in a local Jupyter notebook on my laptop works well:

pd.read_csv('http://fonetik.fr/foo.csv')

However, when I try the same line of code in a Google Colab notebook, the notebook displays the following error:

CertificateError                          Traceback (most recent call last)
<ipython-input-27-030762f24a0e> in <module>()

----> 1 df = pd.read_csv('http://fonetik.fr/foo.csv')
 /usr/lib/python3.6/ssl.py in match_hostname(cert, hostname)
   325         raise CertificateError("hostname %r "
   326             "doesn't match either of %s"
--> 327             % (hostname, ', '.join(map(repr, dnsnames))))
   328     elif len(dnsnames) == 1:
   329         raise CertificateError("hostname %r "

CertificateError: hostname 'fonétik.fr' doesn't match either of 'fonetik.fr', 'www.fonetik.fr', 'www.xn--fontik-dva.fr', 'xn--fontik-dva.fr'

I have just checked the fonetik.fr certificate and it is valid. Thus, I do not undersand why Jupyter Colab raises this error. Maybe because of a redirection of some sort between an IDA server and a non IDA server? Is there a solution to solve that?

You may think I should haved put foo.csv file on Google Drive first to avoid fecthing it on a third-party server. But I can not use this option given that the real foo.csv I want to use is huge and too big to be stored on my Google Drive.

score 1 · Answer 1 · answered Oct 26 '19 at 00:13

1

I have found the following solution for Colab: !wget https://fonétik.fr/foo.csv pd.read_csv(foo.csv)

answered Oct 26 '19 at 00:13

Xavier M

547
3
6
13

score 1 · Answer 2 · answered Oct 26 '19 at 15:07

sometimes I have the same problem so I use it this way it's doing too much ( i know!) but it works just replace ur URL and variables:

 DOWNLOAD_root="https://raw.githubusercontent.com/ageron/handson-ml2/master/"
 Housing_path=os.path.join("datasets","housing")
 Housing_url=DOWNLOAD_root + "datasets/housing/housing.tgz"
 def fetch_housing_data(housing_url=Housing_url, housing_path=Housing_path):
   if not os.path.isdir(housing_path):
     os.makedirs(housing_path)
 tgz_path=os.path.join(housing_path, "housing.tgz")
 urllib.request.urlretrieve(housing_url, tgz_path)
 housing_tgz = tarfile.open(tgz_path)
 housing_tgz.extractall(path=housing_path)
housing_tgz.close()
fetch_housing_data()
def load_housing_data(housing_path=Housing_path):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)

score 0 · Answer 3 · answered Nov 27 '19 at 19:02

foo = pd.read_csv('https://raw.githubusercontent.com/user/repo/file.csv') Works. The raw mode of github.com is just the file.

I am also able to get this from a URL on my blog site stored in the media library.

At first, I was wget'ng it and saving it in the root of Colab instance. This also works but is an extra move step which I later found to be unnecessary.

If the dataset is zipped, you will have to use wget and then !unzip in a cell to bring it to usable form.

In Google Colaboratory, is there a read_csv problem with URL redirections?

3 Answers3