I've trained a simple logistic regression model in SSAS, using Gender and NIC as discrete input nodes (NIC is 0 for non-smoker, 1 for smoker) with Score (0-100) as a continuous output node.
I want to predict the score based on a new participant's values for Gender and NIC. Of course, I can run a singleton query in DMX; for example, the following produces a value of 49.51....
SELECT Predict(Score)
FROM [MyModel]
NATURAL PREDICTION JOIN
(SELECT 'M' AS Gender, '1' AS NIC) as t
But instead of using DMX, I want to create a formula from the model in order to calculate scores while "disconnected" from SSAS.
Investigating the model, I have the following information in the NODE_DISTRIBUTION of the output node:
ATTRIBUTE_NAME ATTRIBUTE_VALUE SUPPORT PROBABILITY VARIANCE VALUETYPE
Gender:F 0.459923854 0 0 0 7 (Coefficient)
Gender:M 0.273306289 0 0 0 7 (Coefficient)
Nic:0 -0.282281195 0 0 0 7 (Coefficient)
Nic:1 -0.802106901 0 0 0 7 (Coefficient)
0.013983007 0 0 0.647513829 7 (Coefficient)
Score 75.03691517 0 0 0 3 (Continuous
Plugging these coefficients into a logistic regression formula -- that I am being disallowed from uploading as a new user :) -- for the smoking male example above,
f(...) = 1 / (1 + exp(0 - (0.0139830071136734 -- Constant(?)
+ 0 * 0.459923853918008 -- Gender:F = 0
+ 1 * 0.273306289390897 -- Gender:M = 1
+ 1 * -0.802106900621717 -- Nic:1 = 1
+ 0 * -0.282281195489355))) -- Nic:0 = 0
results in a value of 0.374.... But how do I "map" this value back to the score distribution of 0-100? In other words, how do I extend the equation above to produce the same value that the DMX singleton query does? I'm assuming it will require the stdev and mean of my Score distribution, but I'm stuck on exactly how to use those values. I'm also unsure whether I'm using the ATTRIBUTE_VALUE in the fifth row correctly as the constant.
Any help you can provide will be appreciated!