0

Background and Goal

I am running an e-commerce website and trying to determine which factor plays the major/minor role in visitors' making purchases. To this end, I have built a logistic regression model on BigQuery, where I have stored a lot of web behavioral data including purchase histories.

On the model

The model built is a logistic regression model with 'is_converted' as label, and bunch of other (potential) factors as features. The 'is_converted' label is binary: 0 if the user hasn't made any purchases, 1 if the use has made a purchase. The features are varied but, you can just assume that those are counts of certain web events the visitor made.

So, the training data would look like this:

enter image description here

Problem

The logistic regression model shows two different types of data whose difference I failed to see: attribution and weight.

By 'attribution' I mean the 'attribution' score I can see on the INTERPRETABILITY tab of the built model, which looks like this:

enter image description here

By 'weight' I mean the results I get when I use the ML.WEIGHTS function as below:

SELECT *
FROM ML.WEIGHTS(MODEL `mydataset.mymodel`, STRUCT(true AS standardize))

It shows both positive and negative values, and its absolute values are somewhat different from the 'attribution' values I get from the model info. The feature with the highest attribution score doesn't seem to have the highest absolute value of weight, and vice versa.

Question

So, the question is, which one of these two should I look into in order to determine the main predictor/factor for the purchase event: attribution or weight? Can I even determine this with this type of machine learning model at all?

Thanks.

1 Answers1

0

Weight does not reflect factor's importance unless you have normalized your data. Attribution score was designed to infer factor's importance. To double check attribution score values, you may calculate simple correlation (Pearson correlation) matrix in BigQuery and make sure that factors with a high attribution score are also highly correlated with "is_converted" target variable.

You may also try Information value and Spearman correlation that are more suitable for binary target variables. But all three statistics should yield about the same list of important factors.

  • Thank you for your reply. But is there any way to try Information Value or Spearman correlation on BigQuery? – Mayiread Jul 13 '23 at 11:33
  • I have not found any info on the Information Value or Spearman correlation in BigQuery so I guess the best option is to import your data into Python and to the calculation in Python. Here is a good discussion on Information Value: https://stackoverflow.com/questions/60892714/how-to-get-the-weight-of-evidence-woe-and-information-value-iv-in-python-pan You may export your data in CSV or connect Python directly to BigQuery. – Oleg Solovyev Jul 13 '23 at 12:46