1

I have a dataframe I created by scraping this PDF with tabula. I'm trying to create a point column using geocoder - but I keep getting a Columns must be same length as key error. My code, as well as a link to the PDF is below: PDF: https://drive.google.com/file/d/1m-KCmEIFlmyVcfYKTTwMaBpH6V5voreH/view?usp=sharing

import tabula
import pandas as pd
import re

### Scrape and clean
dsf = tabula.read_pdf('/content/drive/MyDrive/Topcondoimage 11-22-2021.pdf', pages='all',lattice=True)
df = dsf[0]
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df = df.iloc[: , 1:]
df = df.replace(np.nan, 'Not Available', regex=True)
df['geo_Address'] = df['Building / Address / City']
df['geo_Address'] = df['geo_Address'].map(lambda x: re.sub(r'\r', ' ', x))
df['loc'] = df['geo_Address'].apply(geolocator.geocode, timeout=10)
df['point'] = df['loc'].apply(lambda loc: tuple(loc.point) if loc else None)

df = df.rename(columns={'Building / Address / City': 'building_address_city','Days on\rMarket':'days_on_market','Price /\rSq. Ft.':'price_per_sqft'})
df.reset_index(drop=True, inplace=True)


df[['lat','lon','altitude']] = pd.DataFrame(df['point'].to_list(),index=df.index)

That last line is what triggers the error.

I've tried removing special characters and resetting the index.

Adam
  • 315
  • 1
  • 11
  • It might have something to do with the `.to_list()`? Because it could potentially be the wrong shape. Some more detail with the error would be appreciated, plus print statements of each step. – Larry the Llama Nov 23 '21 at 03:04

0 Answers0