Python: how to create a choropleth map out of a shapefile of Canada?

Question

My goal here is to create a choropleth map of Canada in Python. Suppose I have a dictionary with values referring to each Canadian province/territory:

myvalues={'Alberta': 1.0,
 'British Columbia': 2.0,
 'Manitoba': 3.0,
 'New Brunswick': 4.0,
 'Newfoundland and Labrador': 5.0,
 'Northwest Territories': 6.0,
 'Nova Scotia': 7.0,
 'Nunavut': 8.0,
 'Ontario': 9.0,
 'Prince Edward Island': 10.0,
 'Quebec': 11.0,
 'Saskatchewan': 12.0,
 'Yukon': 13.0}

Now I want to color each province based on the corresponding value in myvalues, using a continuous colormap (e.g., shades of red). How to do that?

So far I have only been able to plot the Canadian provinces/territory within matplotlib, but their shapes appear in a unique color, and I don't know how to change that according to the numbers in myvalues (maybe I need to play with patches but I don't know how).

This is where you can find the shapefile: http://www.filedropper.com/canadm1_1

And this is my code to date:

import shapefile
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
#   -- input --
sf = shapefile.Reader("myfolder\CAN_adm1.shp")
recs    = sf.records()
shapes  = sf.shapes()
Nshp    = len(shapes)
cns     = []
for nshp in xrange(Nshp):
    cns.append(recs[nshp][1])
cns = array(cns)
cm    = get_cmap('Dark2')
cccol = cm(1.*arange(Nshp)/Nshp)
#   -- plot --
fig     = plt.figure()
ax      = fig.add_subplot(111)
for nshp in xrange(Nshp):
    ptchs   = []
    pts     = array(shapes[nshp].points)
    prt     = shapes[nshp].parts
    par     = list(prt) + [pts.shape[0]]
    for pij in xrange(len(prt)):
     ptchs.append(Polygon(pts[par[pij]:par[pij+1]]))
    ax.add_collection(PatchCollection(ptchs,facecolor=None,edgecolor='k', linewidths=.5))
ax.set_xlim(-160,-40)
ax.set_ylim(40,90)

This is the image I am getting so far:

EDIT

I get the solution must be in the following lines:

cm    = get_cmap('OrRd')
cccol = cm(1.*arange(Nshp)/Nshp)

The above script creates a cccol array which in reality has this shape:

array([[ 1.        ,  0.96862745,  0.9254902 ,  1.        ],
       [ 0.99766244,  0.93356402,  0.84133796,  1.        ],
       [ 0.99520185,  0.89227221,  0.74749713,  1.        ],
       [ 0.99274125,  0.84306037,  0.64415227,  1.        ],
       [ 0.99215686,  0.78754327,  0.5740254 ,  1.        ],
       [ 0.99186467,  0.71989237,  0.50508269,  1.        ],
       [ 0.98940408,  0.60670514,  0.39927722,  1.        ],
       [ 0.97304114,  0.50618995,  0.32915034,  1.        ],
       [ 0.94105344,  0.40776625,  0.28732027,  1.        ],
       [ 0.88521339,  0.28115341,  0.19344868,  1.        ],
       [ 0.8220992 ,  0.16018455,  0.10345252,  1.        ],
       [ 0.73351789,  0.04207613,  0.02717416,  1.        ],
       [ 0.61959248,  0.        ,  0.        ,  1.        ]])

I don't know why it has 4 columns, but I figure that if I can somehow link the values of this array to those specified in the values dict, I can solve the problem. Any ideas?

EDIT 2

I have figured out the "trick" is in cccol = cm(). In order to relate this to the provinces, I tried to assign cccol = cm(myvalues.values(i) for i in myvalues.keys())

so that (in my mind at least) each color is assigned based on the relevant key and there are no misplacements. The problem is that I get an error:

TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe'.

How to work around this?

The shapefile you linked is not valid because it only contains the `*.shp`, not the sidecar files (e.g. `CAN_adm1.shx`, `CAN_adm1.dbf`, etc). Can you link a ZIP containing them all? — Jeff G, Jun 21 '16 at 16:57
@JeffG here is the zip folder: http://www.filedropper.com/canadm1_1. I apologise. I will amend the link in the question itself. Thanks. — FaCoffee, Jun 21 '16 at 17:04

score 7 · Accepted Answer · answered Jun 21 '16 at 17:03

This doesn't directly answer your question but hopefully solves your problem just the same. Have you looked at GeoPandas? It provides a simple API for working with and plotting shapefiles. You can replicate your code, including plotting a choropleth, in just a few lines:

import geopandas as gpd
canada = gpd.read_file('CAN_adm1.shp')
canada.plot('myvalues', cmap='OrRd')

This example assumes your shapefile has an attribute on each province that contains the values you want to plot, and the attribute is called "myvalues". If the values aren't stored in the shapefile, you can use canada.merge to merge your values map onto the GeoDataframe.

One caveat: At this time GeoPandas does not have an easy way to plot the legend for choropleth colors. (issue reported here)

`geopandas` is a nice solution here. Here is the image I get by merging `myvalues` with `canada` as Jeff proposes: http://i.stack.imgur.com/KiAry.png — lanery, Jun 21 '16 at 20:11

score 2 · Answer 2 · answered Jun 21 '16 at 16:45

Request: please rename your values dictionary to something else. That name has made writing this answer much more difficult. :)

Haven't tested this, but try:

color_numbers = values.values()
    # assumes the provinces are listed in the same order in values as 
    # they are in the shape file
for nshp in xrange(Nshp):
    ptchs   = []
    # ... code omitted ...
    the_facecolor = [(color_numbers[nshp]-1)/(Nshp-1), 0, 0];   #1..13 -> 0..1, then add G=B=0.
        # change the computation if the values in the values dictionary are no longer 1..13
    ax.add_collection(PatchCollection(ptchs, facecolor=the_facecolor, edgecolor='k', linewidths=.5))

The output you're getting has all blue patches, or [0,0,1]. Since that row isn't in cccol, I don't think cccol is the problem. Also, the code you added never actually references cccol after creating it! (Please add the link to the code sample you started from! :) )

Anyway, setting facecolor should help, as far as I know. Converting the values entry to the range 0..1, then making [R,G,B] color entries, should give you shades of red.

One little flaw, I think: `(color_numbers[nshp]-1)` will always return negative RGB. Am I correct? I tried with removing the `-1` and ended up having a completely black map. How come? — FaCoffee, Jun 21 '16 at 17:02
`color_numbers` in your current example range from 1.0 to 13.0 - they are the part after the colon in `myvalues`. `...-1` scales those numbers to 0-12 and `/(Nshp-1)` changes that to 0-1. If the specific numbers you use change, you would need to change that formula. — cxw, Jun 21 '16 at 17:22

Jeff G · Answer 3 · 2016-06-21T17:23:33.877

You mentioned confusion about cccol being a list of lists. It is list of RGBA tuples (red, green, blue, alpha transparency). These represent 13 "equally spaced" colors from orange to red.

In your case you don't want equally spaced colors but colors corresponding to myvalues. Do this:

cmap = matplotlib.cm.get_cmap('OrRd')
norm = matplotlib.colors.Normalize(min(myvalues.values()), max(myvalues.values()))
color_producer = matplotlib.cm.ScalarMappable(norm=norm, cmap=cmap)

Now color_producer has a method to_rgba that takes values from myvalues and converts them to the correct colors. The Normalize sets the min and max range of myvalues to the extreme colors of the Red-Orange colormap.

Now when you create each province's PatchCollection, you can set its facecolor to the RGBA tuple returned by color_producer:

# Change the province name passed as you iterate through provinces.
rgba = color_producer.to_rgba(myvalues['Manitoba'])
PatchCollection(ptchs, facecolor=rgba, edgecolor='k', linewidths=.5)

Python: how to create a choropleth map out of a shapefile of Canada?

3 Answers3

Linked