I am looking for an easy way to define a function that will consecutively join tables when ran. I am pretty new to Python, but have been given the task of building out a package that heavily relies on joins to work successfully.
I have done plenty of work in R, but will be finishing this in Python (unless I just hit a wall). The goal is to automate a complete task to where a dataframe could be inserted, pushed through a function, and then a presented in a couple different views. This would require one function for each view. Because of this, there are a
This is horrible, and as I am familiar with dplyr, I'm trying to use dfply to accomplish this.
def get_hcc(df, df2, df3):
df = (df >> inner_join(df2, by=[('col1', 'col2'), ('col1', 'col3')]))
df = df.drop_duplicates()
df = (df3 >> left_join(df, by = 'col4'))
return df
If anyone has better ideas as to how to go about this, that would be greatly appreciated!
Thanks.