This is currently my code where I need to add tags of different users based on the same movieId. This code does work and does the job I need it to do but it takes a lot of time to complete. So my question was: can this task be done more efficiently?
def taglist(movieIDX):
grouped = one_hot_encoded_tags.groupby(one_hot_encoded_tags.movieId)
column_list = list(one_hot_encoded_tags)
column_list.remove("userId" )
column_list.remove("movieId")
column_list.remove("timestamp")
df_new = grouped.get_group(movieIDX)
df_new.head()
lists = list()
lists.append(movieIDX)
for tag in column_list:
som = df_new[tag].sum()
lists.append(som)
som = 0
df1 = pd.DataFrame([lists])
return df1
alfa = taglist(1)
c = 1
for i in one_hot_encoded_tags.movieId.unique():
if i != 1:
try:
beta = taglist(i)
alfa = pd.concat([alfa, beta], ignore_index = True, axis = 0)
print (str(c) + " out of 3077 have been completed" )
c = c + 1
except KeyError:
c = c + 1
I only recently started programming so apologies for any stupid bits in my code.
Thanks in advance for the help.
Edit: I replaced the code with this:
groupen = one_hot_encoded_tags.groupby(one_hot_encoded_tags.movieId)
bb = groupen.sum()
bb = bb.drop('userId', 1)
bb = bb.drop('timestamp', 1)
bb
And it completed my time from 20 minutes for the dataset to 2 seconds.