Please, consider the following:
edx2
is a dataframe with 9000055 obs. of 8 variables;
movieId
is a column of edx2 with more than 10,000 levels;
rating
is a # column of edx2; mean(edx2$rating) is a valid number.
naive
is a # constant;
My objective with the code below is to get a table relating a # outcome for each movieId
level:
movie_dev <- edx2 %>%
group_by(movieId) %>%
summarize(movieid_dev = mean(rating - naive))
The result I'm getting is a 1x1 dataframe, instead of +10,000 x 2 variables. I don't understand why. Any clues?