3

For an auctions website I run, I'm aiming to find which features have the highest influence on bids received. This way, I can focus my energies on improving features that matter the most.

I've been advised to run a Poisson Regression analysis for this purpose. This question is about getting the data ready for regression, and then running the actual regression. I'm using Python for this purpose.

The data: The dataset comprises auctions that lived for precisely 7 days. There is a mix of continuous and categorical features. Continuous ones are asking_price, description_char_count and num_of_photos.

Categorical variables are city, item_category and item_condition.

The dependent variable is net_unique_bids.

How do I handle the categorical variables?

Dummy variables: Correct me if I'm wrong - but I think I need to do the following:

# convert categorical columns
cities = pd.get_dummies(df['city'], drop_first=True)
categ = pd.get_dummies(df['item_category'], drop_first=True)
cond = pd.get_dummies(df['item_condition'], drop_first=True)

# add to main dataframe 'df'
df = pd.concat([df,cities,categ, cond], axis=1)

# remove original categorical columns
df.drop('city',axis=1, inplace=True)
df.drop('item_category',axis=1, inplace=True)
df.drop('item_condition',axis=1, inplace=True)

Running Poisson Regression: If this is correct so far, the next steps entail:

from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.cov_struct import (Exchangeable,
    Independence,Autoregressive)
from statsmodels.genmod.families import Poisson

f1 = "net_unique_bids ~ city1 + city2 + city3 + city4 + item_category1 + item_category2 + item_category3 + item_condition1 + item_condition2 + item_condition3 + asking_price + description_char_count + num_of_photos"
model1 = GEE.from_formula(formula=f1, data=df, cov_struct=Independence(), family=Poisson())

Do I have the right idea around how to handle categorical variables? Am I running Poission Regression correctly (and have I formulated f1 correctly as well)?

If not, help me fill out the gaps.


Note: I got my guidance on Poisson Regression in Python from here.

Hassan Baig
  • 15,055
  • 27
  • 102
  • 205

0 Answers0