I noticed that panda's read_csv()
fails at reading a public CSV file hosted on GitLab:
import pandas as pd
df = pd.read_csv("https://gitlab.com/stragu/DSH/-/raw/master/Python/pandas/spi.csv")
The error I get (truncated):
HTTPError Traceback (most recent call last)
<ipython-input-3-e1c0b52ee83c> in <module>
----> 1 df = pd.read_csv("https://gitlab.com/stragu/DSH/-/raw/master/Python/pandas/spi.csv")
[...]
~\Anaconda3\lib\urllib\request.py in http_error_default(self, req, fp, code, msg, hdrs)
647 class HTTPDefaultErrorHandler(BaseHandler):
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650
651 class HTTPRedirectHandler(BaseHandler):
HTTPError: HTTP Error 403: Forbidden
However, using R, the base function read.csv()
reads it happily:
df <- read.csv("https://gitlab.com/stragu/DSH/-/raw/master/Python/pandas/spi.csv")
head(df)
#> country_code year spi
#> 1 AFG 2020 42.29
#> 2 AFG 2019 42.34
#> 3 AFG 2018 40.61
#> 4 AFG 2017 38.94
#> 5 AFG 2016 39.65
#> 6 AFG 2015 38.62
Created on 2020-10-29 by the reprex package (v0.3.0)
Any idea why that is, and how R achieves it?
Versions used:
- R 4.0.3
- Python 3.7.9
- pandas 1.1.3