I see two problems with the approach:
parameter estimates:
If there are different number of repeated observations, the weight for observations with multiple categories will be larger than observations with only a single category. This could be corrected by using weights in the linear model. Use WLS with weights equal to the inverse of the number of repetitions (or the square root of it ?). Weights are not available for other models like Poisson or Logit or GLM-Binomial. This will not make a larger difference for the parameter estimates, if the "pattern", i.e. the underlying parameters are not systematically different across movies with different number of categories.
Inference, standard error of parameter estimates:
All basic models like OLS, Poisson and so on assume that each row is an independent observation. The total number of rows will be larger than the number of actual observations and the estimated standard errors of the parameters will be underestimated. (We could use cluster robust standard errors, but I never checked how well they work with duplicate observations, i.e. response is identical across several observations.)
Alternative
As an alternative to repeating observations, I would encode the categories into non-exclusive dummy variables. For example, if we have three levels of the categorical variable, movie categories in this case, then we add a 1
in each corresponding column if the observation is "in"
that category.
Patsy doesn't have premade support for this, so the design matrix for the movie category would need to be build by hand or as the sum of the individual dummy matrices (without dropping a reference category).
alternative model
This is not directly related to the issue of multiple categories in the explanatory variables.
The response variable movie ratings is bound to be between 0 and 100. A linear model will work well as a local approximation, but will not take into account that observed ratings are in a limited range and will not enforce it for prediction.
Poisson regression could be used to take the non-negativity into account, but wouldn't use the upper bound. Two alternatives that will be more appropriate are GLM with Binomial family and a total count for each observation set to 100 (maximum possible rating), or use a binary model, e.g. Logit or Probit, after rescaling the ratings to be between 0 and 1.
The latter corresponds to estimating a model for proportions which can be estimated with the statsmodels binary response models. To have inference that is correct even if the data is not binary, we can use robust standard errors. For example
result = sm.Logity(y_proportion, x).fit(cov_type='HC0')