I have a dataframe I created by scraping this PDF with tabula. I'm trying to create a point
column using geocoder - but I keep getting a Columns must be same length as key
error. My code, as well as a link to the PDF is below:
PDF: https://drive.google.com/file/d/1m-KCmEIFlmyVcfYKTTwMaBpH6V5voreH/view?usp=sharing
import tabula
import pandas as pd
import re
### Scrape and clean
dsf = tabula.read_pdf('/content/drive/MyDrive/Topcondoimage 11-22-2021.pdf', pages='all',lattice=True)
df = dsf[0]
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df = df.iloc[: , 1:]
df = df.replace(np.nan, 'Not Available', regex=True)
df['geo_Address'] = df['Building / Address / City']
df['geo_Address'] = df['geo_Address'].map(lambda x: re.sub(r'\r', ' ', x))
df['loc'] = df['geo_Address'].apply(geolocator.geocode, timeout=10)
df['point'] = df['loc'].apply(lambda loc: tuple(loc.point) if loc else None)
df = df.rename(columns={'Building / Address / City': 'building_address_city','Days on\rMarket':'days_on_market','Price /\rSq. Ft.':'price_per_sqft'})
df.reset_index(drop=True, inplace=True)
df[['lat','lon','altitude']] = pd.DataFrame(df['point'].to_list(),index=df.index)
That last line is what triggers the error.
I've tried removing special characters and resetting the index.