0

I have a movie rating/watchedenter code here dataset with columns userid, movieId and timestamp.

I want to groupby the dataframe according to the users and each row should contain the movies not more than a certain time (lets say 500 in my case ) but no of items at each entry should not be greater than 100.

input_data={'userId':[1,1,1,2,2,3,3,3,1,1],'movieId':[10,20,30,40,50,60,70,80,90,100],'timestamp':[100,200,300,400,500,600,700,800,900,1000]}

input_df=pd.DataFrame(columns=['userId','movieId','timestamp'],data=input_data)
input_df

enter image description here

The Output should look like:

output_data={'userId':[1,2,3,1],'movies':[[10,20,30],[40,50],[60,70,80],[90,100]]}
output_df=pd.DataFrame(columns=['userId','movies'],data=output_data)
output_df

enter image description here

  • Your output is not considering an 'end time' correct? Also do you want userId 1 twice in the output? – T C Molenaar Oct 19 '22 at 13:32
  • `Input` is a reserved keyword...Don't use as a `var` name – Bhargav - Retarded Skills Oct 19 '22 at 13:42
  • Can you elaborate what you mean by ..."but no of items at each entry should not be greater than 100" ?? And also in your output, it does not seem that you are taking into account your specification that ..."each row should contain the movies not more than a certain time (lets say 500 in my case )" – ScottC Oct 19 '22 at 15:47
  • End time is not considered. – NITISH PANDEY Oct 20 '22 at 05:14
  • And, maximum entry should be 100 means.. if We have any row having more than 100 items , then the row should be splitted into multiple rows but with same userId .. if you have 500 items ,, it will be splitted into 5 rows of 100 items each with same userId – NITISH PANDEY Oct 20 '22 at 05:16
  • if you have any difficulty with limit of items in each entry.. you can ignore for a while,, – NITISH PANDEY Oct 20 '22 at 05:17

0 Answers0