Reformat census title

Question

I've been tasked to dig through census data for things at the block level. After learning how to navigate AND find what i'm looking for I hit a snag. tabblock polygons (block level polygon) have an id consisting of a 15 length string,

ex: '471570001022022'

but the format from the census data is labelled:

'Block 2022, Block Group 2, Census Tract 1, Shelby County, Tennessee'

the block id is formatted: state-county-tract-group-block, with some leading zeros to make 15 characters. sscccttttggbbbb

Does anyone know a quick way to get this into a usable format? I thought i would ask before i spend my time trying to cook up a python script.

Thanks, gm

from census:'Block 2022, Block Group 2, Census Tract 1, Shelby County, Tennessee', but i need it to read: '471570001022022'. — gm70560, Jan 31 '13 at 19:28
How do you get at the mapping between state and county names and their numerical representations? — Silas Ray, Jan 31 '13 at 19:30

score 1 · Answer 1 · answered Jan 31 '13 at 19:37

1

well, i got it.

ex = 'Block 2022, Block Group 2, Census Tract 1, Shelby County, Tennessee'

new_id = '47157' + ex[40:len(ex)-26].zfill(4) + '0' + ex[24] + ex[6:10]

state and county values are constant; block groups only go to one digit (afaik).

answered Jan 31 '13 at 19:37

gm70560

140
11

best answer: download the right format from the given options from the "fact finder" on the census page. The csv gives a properly formatted ID field. – gm70560 Feb 04 '13 at 21:12
plus: the format is ss-ccc-tttttt-bbbb (state, county, tract, block) and block group isn't present. working with that, i used a dict{} to find the tracts and give the proper format. then i scraped it when i found the download option. – gm70560 Feb 04 '13 at 21:14

score 1 · Answer 2 · answered Jan 31 '13 at 19:40

1

Using struct might be neater

>>> import struct
>>> r = '471570001022022'
>>> f = '2s3s4s2s4s'
>>> struct.unpack(f, r)
('47', '157', '0001', '02', '2022')
>>> s, c, t, g, b = unpack(f, r)
>>> print s
47

answered Jan 31 '13 at 19:40

sotapme

4,695
2
19
20

score 1 · Answer 3 · answered Jan 31 '13 at 19:54

Assuming this data is correct, and you've parsed it in to two dictionaries, state_ids and county_ids, where the keys are the string representations for the entities and the values are the numerical representations as strings:

def get_tabblock_id(tabblock_string):
    block, block_group, tract, county, state = re.match('Block (\\d+), Block Group (\\d+), Census Tract (\\d+), (.+), (.+)', tabblock_string).groups()
    return state_ids[state].zfill(2) + county_ids[county].zfill(3) + tract.zfill(4) + block_group.zfill(2) + block.zfill(4)

i'll this a whirl when i go back into the workplace to see how it goes. — gm70560, Jan 31 '13 at 23:41

Reformat census title

3 Answers3