When using the Google Prediction API (v1.6) for classification, I get different behavior when using "insert" to train the model versus "update".
If I upload a csv file to storage and train (insert) using it or use the insert method and include the training data in the request, the results is the same. (I.e. which insert method I use doesn't matter).
However, creating an empty model via insert and then adding all the data via updates yields a different result.
The values of prediction probabilities are very different and the model created via the insert doesn't seem to be affected by updates after the initial training.
Using the Insert, the prediction probabilities for "Addr12" are:
Predicting: Addr12
Prob: 0.071895 Label: Logon Name
Prob: 0.039216 Label: State
Prob: 0.000000 Label: Logon Type
Prob: 0.013072 Label: SSN
Prob: 0.052288 Label: Employee Number
Prob: 0.032680 Label: First Name
Prob: 0.071895 Label: Middle Name
Prob: 0.052288 Label: Last Name
Prob: 0.071895 Label: Date Of Birth
Prob: 0.098039 Label: Gender
Prob: 0.006536 Label: Eligibility Class
Prob: 0.019608 Label: Location
Prob: 0.104575 Label: Address 1
Prob: 0.111111 Label: Address 2
Prob: 0.026144 Label: City
Prob: 0.058824 Label: Zip
Prob: 0.091503 Label: Date Of Hire
Prob: 0.078431 Label: Hours Worked Per Week
Using the Update, the prediction probabilities for "Addr12" are:
Predicting: Addr12
Prob: 0.000000 Label: Hours Worked Per Week
Prob: 0.000000 Label: Date Of Hire
Prob: 0.000000 Label: Zip
Prob: 0.000000 Label: State
Prob: 0.000000 Label: City
Prob: 0.527513 Label: Address 2
Prob: 0.472487 Label: Address 1
Prob: 0.000000 Label: Location
Prob: 0.000000 Label: Eligibility Class
Prob: 0.000000 Label: Gender
Prob: 0.000000 Label: Date Of Birth
Prob: 0.000000 Label: Last Name
Prob: 0.000000 Label: Middle Name
Prob: 0.000000 Label: First Name
Prob: 0.000000 Label: Employee Number
Prob: 0.000000 Label: SSN
Prob: 0.000000 Label: Logon Type
Prob: 0.000000 Label: Logon Name
Lastly, the output of Analyze after using insert contains the dataDescription/outputFeature/text plus the modelDescription and confusionMatrix. The output of Analyze after using the update doesn't contain the modelDescription and confusionMatrix (no I'm not simple excluding those fields in the output).
Anybody have success using insert to train an initial model while being able to use update to improve it?
----- Ed