I have this frame:
regions = pd.read_html('http://www.mapsofworld.com/usa/usa-maps/united-states-regional-maps.html')
messy_regions = regions[8]
Which yields something like this:
|0 | 1
--- |---| ---
0| Region 1 (The Northeast)| nan
1| Division 1 (New England)| Division 2 (Middle Atlantic)
2| Maine | New York
3| New Hampshire | Pennsylvania
4| Vermont | New Jersey
5| Massachusetts |nan
6| Rhode Island |nan
7| Connecticut | nan
8| Region 2 (The Midwest) | nan
9| Division 3 (East North Central)| Division 4 (West North Central)
10| Wisconsin | North Dakota
11| Michigan | South Dakota
12| Illinois | Nebraska
The goal is to make this a tidy dataframe and I think I need to pivot in order to get the regions and Divisions as columns with the states as values under the correct regions/divisions. Once it's in that shape then I can just melt to the desired shape. I can't figure out though how to extract what would be the column headers out of this though. Any help is appreciated and at the very least a good point in the right direction.