2

Let me start by saying, other than a stata .do file and some MCMC in R, I haven't coded since an AOL warez group application in VB; so I apologize for being here messin with y'all to begin with.

I am writing an environmental justice paper working with demographic and exposure data at the census block group level. Because block groups can be relatively small in size, a pollution source and the people living in one block group can easily impact AT LEAST first order neighbors. I was going to be a dullard and just aggregate back up through the FIPS code, but that is just bad math.

Got the shape file for ACS year I want, tried arcGIS first but I was not getting anywhere. I then read about Pysal and installed that

imported shape file ran the (12 hours) queen neighbor analysis on all 216,000 block groups

In [52]: w.histogram Out[52]:

[(0, 87), (1, 709), (2, 3634), (3, 16627), (4, 48736), (5, 56952), (6, 42848), (7, 24878), (8, 12646), (9, 6294), (10, 3040), (11, 1515), (12, 759), (13, 432), (14, 233), (15, 128), (16, 85), (17, 44), (18, 34), (19, 20), (20, 21), (21, 13), (22, 8), (23, 7), (24, 6), (25, 1), (26, 3), (27, 1), (28, 2), (29, 1), (30, 2), (31, 1), (32, 0), (33, 2), (34, 0), (35, 1), (36, 1), (37, 1), (38, 0), (39, 0), (40, 0), (41, 0), (42, 0), (43, 0), (44, 0), (45, 0), (46, 1), (47, 0), (48, 0), (49, 0), (50, 0), (51, 0), (52, 0), (53, 0), (54, 0), (55, 0), (56, 0), (57, 0), (58, 0), (59, 0), (60, 0), (61, 1)]

What I need is a .csv (or honestly anything will do if I copy/paste it somewhere) that enumerates each block group by FIPS (which should be what the ACS shapefile uses for ID) and it's list of neighbors.

If I can get the list, I can move it over to an environment where I am more comfortable. I sat there and played with it for hours last night and could get a couple cracks at numpy.savetext to work, but it was only a single column and the numbers were stored in scientific notation because FIPS codes are 12 digits. One time it told me the tuple was out of range, and I think that was the closest I got

I searched for just the data itself rather extensively before hand, or I promise I would not be here wasting your time.

Thank you, Dave

1 Answers1

1

You can write the W to a txt file with pysal. There are a number of different formats, but a "GAL" file is the simplest.

It's a txt file, the first line is the number of shapes. Each record is then 2 lines,

id n
id0, id1, ... 

where:
  id is the id of the polygon,
  n is the number of neighbors
  id0 is the id of the first neighbor
  ... and so on

For example:

 3
 0 1
 1
 1 2
 0 2
 2 1
 1

...describes the graph 0-1-2, 0 has 1 neighbor (1), 1 has 2 neighbors (0, 2), and so on.

To write your W to a gal file...

>>> W = pysal.queen_from_shapefile("/path/to/shapefile.shp")
>>> out = pysal.open("output.gal", 'w')
>>> out.write(W)
>>> out.close()

Note: The ids are offsets. 0 is the first polygon, 1 is the second, and so on.

If you want to link the offsets to FIPS code you'd need to do this on your own. But, you can use pysal to extract the FIPS codes in the correct order..

>>> dbf = pysal.open("/path/to/shapefile.dbf", "r")
>>> print dbf.header
[column names, ... ]
>>> FIPS = dbf.by_col("name_of_fips_code_column")
>>> FIPS = map(str, FIPS) #make sure we're writing strings
>>> out = open('fips.txt','w')
>>> out.write('\n'.join(FIPS))
>>> out.close()
Charles
  • 1,820
  • 13
  • 16
  • Oh man, you are my freakin hero. Got the W to a .gal, pasted to an excel, and just suppressed evens then odds to line everything up, pasting the FIPS codes in the same order right now. I really cannot thank you enough. – Dave O'Donnell Apr 20 '16 at 07:49
  • i just fixed the code example... don't forget to call out.close() on fips.txt file, or it may be truncated. – Charles Apr 20 '16 at 15:41
  • I got it to work just as it was. I just took the .gal and put it into excel, then suppressed every other row, pasted into a new sheet, did the same with the alternate row, then pasted both columns up against the fips column (the 12 digit was actuall "GEOID" in the .dbf). They were both the same nRow, so I think I have it right. I am just using the =INDEX command in excel to match the neighbor enumeration back to the fips column. I will take a couple random samples when I am done and check a map to make sure everything is lined up correctly. – Dave O'Donnell Apr 22 '16 at 01:38
  • I do have one question though. If I were to go back through and do a rook neighbor calculation, is there a way to get the neighbors and some type of weight related to the length of the border they share? Again, thank you so much. I am way out my comfort zone trying to extract this data. However, it is pretty commonly accepted that geospatial regressions are the best way to try and get at this environmental justice work. This seemed like a good way to dip my toes in the kiddy pool. Much appreciated. Dave – Dave O'Donnell Apr 22 '16 at 01:39
  • 1
    Yes, PySAL shared_perimeter_weights are supported through a contrib module, meaning extra dependancies are required (shapely in this case). https://github.com/pysal/pysal/blob/master/pysal/contrib/shared_perimeter_weights.py, to run install shapely and in your python script: "from pysal.contrib.shared_perimeter_weights import spw_from_shapefile" and so on. You'll want to save these as a ".gwt" to preserve the weight value. – Charles Apr 22 '16 at 17:33
  • Awesome. I have been playing around with it and am in the process of tacking on some demographic and SES variables to the information contained in the .dbf (by taking then rook neighbors after adding the FIPS codes, adding the data, then resorting by polygon id) am going to try and then save the altered file as a .dbf - I have been looking at the Spatial Lag option in the pysal documentation. Hoping to do get a weight for population, one for %minority, and one for %low income. Would love to be able to multiply those each by the parimeter weight. Will tinker with it. Thanks again – Dave O'Donnell Apr 23 '16 at 18:44