Note: Correction - the code returns AttributeError: 'str' object has no attribute 'drop_duplicates'
I am trying to loop through a number of dfs and reduce my 'user_id' column to only unique values using the
df.drop_duplicates(subset =['user_id']
function.
I need this to be a global change and am trying to incorporate it into my function that imports .csv files and assigns them to their file name. This works perfectly but when I try and add the drop_duplicates function, it doesn't seem to do anything:
def assign_vars(files = os.listdir()):
# Make list of variable names using file name
variables = [make_var(file) for file in files]
# Start list to place dfs into
dfs = []
for var,file in zip(variables,files):
# Use globals to assign dfs to file names
globals()[var] = pd.read_csv(file)
#<<1>>
# Add each newly made df var to a list
dfs.append(var.drop_duplicates(subset =['user_id'])) # rmv duplicates
return print('Your variables are: ',sorted(dfs))
This returns an attribute error.It seems that the var is being treated as a str instead of a df
When I len()
a df, they remain the same as before. Even though when I individually df.drop_duplicates
they shorten in len()
by about 70%.
Alternatively, I have tried to make a mid object at <<1>> and then .drop_duplicates
. This doesn't work and I believe its because the change is staying local.