How to download a CSV file from the World Bank's dataset

Question

I would like to automate the download of CSV files from the World Bank's dataset.

My problem is that the URL corresponding to a specific dataset does not lead directly to the desired CSV file but is instead a query to the World Bank's API. As an example, this is the URL to get the GDP per capita data: http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv.

If you paste this URL in your browser, it will automatically start the download of the corresponding file. As a consequence, the code I usually use to collect and save CSV files in Python is not working in the present situation:

baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv"
remoteCSV = urllib2.urlopen("%s" %(baseUrl))
myData = csv.reader(remoteCSV)

How should I modify my code in order to download the file coming from the query to the API?

The problem is more likely that the data is zipped. You will need to decompress it before you can work with it. — MrAlexBailey, Mar 20 '15 at 13:34
You must use `zipfile` lib to extract data from zip package. — Mauro Baraldi, Mar 20 '15 at 13:56
try follow process at http://stackoverflow.com/questions/18885175/read-a-zipped-file-as-a-pandas-dataframe — Joop, Mar 20 '15 at 14:14

MrAlexBailey · Accepted Answer · 2015-03-20T14:34:25.350

This will get the zip downloaded, open it and get you a csv object with whatever file you want.

import urllib2
import StringIO
from zipfile import ZipFile
import csv

baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv"
remoteCSV = urllib2.urlopen(baseUrl)

sio = StringIO.StringIO()
sio.write(remoteCSV.read())
    # We create a StringIO object so that we can work on the results of the request (a string) as though it is a file.

z = ZipFile(sio, 'r')
    # We now create a ZipFile object pointed to by 'z' and we can do a few things here:

print z.namelist()
    # A list with the names of all the files in the zip you just downloaded
    # We can use z.namelist()[1] to refer to 'ny.gdp.pcap.cd_Indicator_en_csv_v2.csv'

with z.open(z.namelist()[1]) as f:
# Opens the 2nd file in the zip
    csvr = csv.reader(f)
    for row in csvr:
        print row

For more information see ZipFile Docs and StringIO Docs

Thanks, it works and it has solved my problem and I have learnt something new about StringIO library. — SirC, Mar 20 '15 at 15:55

score 2 · Answer 2 · answered Mar 20 '15 at 14:43

import os
import urllib
import zipfile
from StringIO import StringIO

package = StringIO(urllib.urlopen("http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv").read())
zip = zipfile.ZipFile(package, 'r')
pwd = os.path.abspath(os.curdir)

for filename in zip.namelist():
    csv = os.path.join(pwd, filename)
    with open(csv, 'w') as fp:
        fp.write(zip.read(filename))
    print filename, 'downloaded successfully'

From here you can use your approach to handle CSV files.

Thank you, both answers work fine, I flagged the other one as the one answering my question just because it is a bit more "didactic". — SirC, Mar 20 '15 at 15:57

score 1 · Answer 3 · answered Jun 08 '18 at 10:02

We have a script to automate access and data extraction for World Bank World Development Indicators like: https://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS

The script does the following:

Downloading the metadata data
Extracting metadata and data
Converting to a Data Package

The script is python based and uses python 3.0. It has no dependencies outside of the standard library. Try it:

python scripts/get.py

python scripts/get.py https://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS

You also can read our analysis about data from World Bank:

https://datahub.io/awesome/world-bank

score -1 · Answer 4 · answered May 12 '16 at 07:11

-1

Just a suggestion than a solution. You can use pd.read_csv to read any csv file directly from a URL.

import pandas as pd
data = pd.read_csv('http://url_to_the_csv_file')

answered May 12 '16 at 07:11

Kathirmani Sukumar

10,445
5
33
34

How to download a CSV file from the World Bank's dataset

4 Answers4