0

This is currently my code where I need to add tags of different users based on the same movieId. This code does work and does the job I need it to do but it takes a lot of time to complete. So my question was: can this task be done more efficiently?

def taglist(movieIDX):
  grouped = one_hot_encoded_tags.groupby(one_hot_encoded_tags.movieId)
  column_list = list(one_hot_encoded_tags)
  column_list.remove("userId" )
  column_list.remove("movieId")
  column_list.remove("timestamp")
  df_new = grouped.get_group(movieIDX)
  df_new.head()
  lists = list()
  lists.append(movieIDX)
  for tag in column_list:
    som = df_new[tag].sum()
    lists.append(som)
    som = 0
  df1 = pd.DataFrame([lists])

  return df1
alfa = taglist(1)
c = 1
for i in one_hot_encoded_tags.movieId.unique():
 if i != 1:
  try:
    beta = taglist(i)
    alfa = pd.concat([alfa, beta], ignore_index = True, axis = 0)
    print (str(c) + " out of 3077 have been completed" )
    c = c + 1
  except KeyError:
    c = c + 1

I only recently started programming so apologies for any stupid bits in my code.

Thanks in advance for the help.

Edit: I replaced the code with this:

groupen = one_hot_encoded_tags.groupby(one_hot_encoded_tags.movieId)
bb = groupen.sum()
bb = bb.drop('userId', 1)
bb = bb.drop('timestamp', 1)
bb

And it completed my time from 20 minutes for the dataset to 2 seconds.

  • Check ou this link click [here] (https://stackoverflow.com/questions/38733477/whats-the-best-way-to-sum-all-values-in-a-pandas-dataframe) – Djangodev Feb 03 '22 at 11:16
  • Can you paste a few rows of your data? As far as I understand you got column 'movieIDX' and you want to find all rows having 2 or more same tags? – pinegulf Feb 03 '22 at 11:39

0 Answers0