I am trying to merge two pandas data frames (one is a spatial dataframe -sdf- and the other a simple dataframe) on a common field, GEOID using pd.merge. In the sdf, the GEOID is a string, in the df it was an int. I used the .astype('str') to convert the df GEOID field to a string as well. Still, when calling pd.merge the output is either blank or I get an error that says 'You are trying to merge on object and int64 columns.' I have confirmed with .dtypes() that both are strings. Any idea why the merge is not functioning?
I have tried converting both to string, as well as both to int. I also tried using pd.join and pd.concat but neither work as expected.
import pandas as pd
#Read in CSV with updates (would already be df from Socrata pull in real version)
updated_csv= r"C:\Users\mad10412\Desktop\Active_Business_Data_Edited.csv"
updated_csv_df = pd.read_csv(updated_csv)
updated_csv_df.head(5)
updated_csv_df['GEOID10']=updated_csv_df['GEOID10'].astype(str)
updated_csv_df.dtypes
output_layer_name = 'Join_Features_Test5'
actbus=gis.content.search(output_layer_name)
ActiveBusinesses_item = actbus[0]
ActiveBusinesses_item
ActiveBusinesses_flayer = ActiveBusinesses_item.layers[0]
ActiveBusinesses_flayer
ActiveBusinesses_fset = ActiveBusinesses_flayer.query() #querying without any conditions returns all the features
ActiveBusinesses_fset.sdf.head()
ActiveBusinesses_fset.sdf.shape
ActiveBusinesses_fset.sdf.dtypes
##Attempt 1: Includes original data and Adds Column names but no data
overlap_rows = ActiveBusinesses_fset.sdf.join(updated_csv_df.set_index('GEOID10'),on='GEOID10', lsuffix='_left', rsuffix='_right')
overlap_rows.head(10)
overlap_rows.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF.csv")
##Attempt 2: Only includes column name. no data at all
overlap_rows = pd.merge(left = ActiveBusinesses_fset.sdf,
right = updated_csv_df,
how='inner',
on = 'GEOID10')
overlap_rows.head(5)
overlap_rows.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF2.csv")
##Attempt 3: Includes all columns and all data, but GEOIDs don't match
result = pd.concat([ActiveBusinesses_fset.sdf, updated_csv_df], axis=1, join='inner')
result.head(5)
result.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF3.csv")
##Attempt 4: Only includes column name. no data at all
left=ActiveBusinesses_fset.sdf
right=updated_csv_df
result = pd.merge(left, right, how='inner',on=['GEOID10', 'GEOID10'])
result.head(5)
result.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF4.csv")
The data for both dataframes looks like this:
df=pd.DataFrame({'GEOID': ['060372932023', '060372941201', '060372932022'],
'Mining': [6, 4,2 ],
'Agriculture': [10, 12, 4]})
df
The only difference between the data frames is one has a shape column containing the geometry. Essentially, I am trying to merge these data frames together to find instances where the values for the fields such as Agriculture and Mining are different.
df=pd.DataFrame({'GEOID': ['060372932023', '060372941201', '060372932022'],
'Mining': [6, 4,2 ],
'Agriculture': [10, 12, 4],
'Mining2': [8, 3 , 1],
'Agriculture2': [14, 0, 6]})
df
This should yield in one row for each GEOID, with data from both dataframes. See the last code snippet comments for what the outputs actually look like.