2

I've been tasked to dig through census data for things at the block level. After learning how to navigate AND find what i'm looking for I hit a snag. tabblock polygons (block level polygon) have an id consisting of a 15 length string,

ex: '471570001022022'

but the format from the census data is labelled:

'Block 2022, Block Group 2, Census Tract 1, Shelby County, Tennessee'

the block id is formatted: state-county-tract-group-block, with some leading zeros to make 15 characters. sscccttttggbbbb

Does anyone know a quick way to get this into a usable format? I thought i would ask before i spend my time trying to cook up a python script.

Thanks, gm

gm70560
  • 140
  • 11
  • from census:'Block 2022, Block Group 2, Census Tract 1, Shelby County, Tennessee', but i need it to read: '471570001022022'. – gm70560 Jan 31 '13 at 19:28
  • How do you get at the mapping between state and county names and their numerical representations? – Silas Ray Jan 31 '13 at 19:30

3 Answers3

1

well, i got it.

ex = 'Block 2022, Block Group 2, Census Tract 1, Shelby County, Tennessee'

new_id = '47157' + ex[40:len(ex)-26].zfill(4) + '0' + ex[24] + ex[6:10]

state and county values are constant; block groups only go to one digit (afaik).

gm70560
  • 140
  • 11
  • best answer: download the right format from the given options from the "fact finder" on the census page. The csv gives a properly formatted ID field. – gm70560 Feb 04 '13 at 21:12
  • plus: the format is ss-ccc-tttttt-bbbb (state, county, tract, block) and block group isn't present. working with that, i used a dict{} to find the tracts and give the proper format. then i scraped it when i found the download option. – gm70560 Feb 04 '13 at 21:14
1

Using struct might be neater

>>> import struct
>>> r = '471570001022022'
>>> f = '2s3s4s2s4s'
>>> struct.unpack(f, r)
('47', '157', '0001', '02', '2022')
>>> s, c, t, g, b = unpack(f, r)
>>> print s
47
sotapme
  • 4,695
  • 2
  • 19
  • 20
1

Assuming this data is correct, and you've parsed it in to two dictionaries, state_ids and county_ids, where the keys are the string representations for the entities and the values are the numerical representations as strings:

def get_tabblock_id(tabblock_string):
    block, block_group, tract, county, state = re.match('Block (\\d+), Block Group (\\d+), Census Tract (\\d+), (.+), (.+)', tabblock_string).groups()
    return state_ids[state].zfill(2) + county_ids[county].zfill(3) + tract.zfill(4) + block_group.zfill(2) + block.zfill(4)
Silas Ray
  • 25,682
  • 5
  • 48
  • 63