0

I am interested in developing a logit-based choice model using Tensorflow.

I am fairly new to this tool, so I was wondering if there is a way to get the statistics (i.e., the p-value) of the weights obtained from Tensorflow , just like someone would get from Stata or SPSS.

The code does run, but cannot be sure if the model is valid unless I can compare the p-values of the variables from the estimation result from STATA.

The data structure is simple; it's a form of a survey, where a respondent chooses an alternative out of 4 options, each with different feature levels (a.k.a. a conjoint analysis).

(I am trying something new; that's why I am not using pylogit of xlogit packages.)

Below is the code I wrote:

mport numpy as np
import tensorflow as tf
import pandas as pd

from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

np.random.seed(0)
tf.random.set_seed(0)



variables = pd.read_excel('file.xls')

target_vars = ['A','B','C','D','E']
X = pd.DataFrame()
for i in target_vars:
    X[i]=variables[i]


y = variables['choice']

X_tn, X_te, y_tn, y_te = train_test_split(X, y, random_state=0)

n_feat = X_tn.shape[1]
epo = 100

model = Sequential()
model.add(Dense(1, input_dim=n_feat, activation='sigmoid'))
model.add(Dense(1))

model.compile(loss = 'mean_squared_error',
              optimizer = 'adam',
              metrics = ['mean_squared_error'])
hist = model.fit(X_tn, y_tn, epochs=epo, batch_size=4)

model.summary()
model.get_weights()

some other optional questions only if you are familiar with discrete choice models...

i) the original dataset is a conjoint survey with 4 alternatives at each choice situation - that's why I put batch_size=4. Am I doing it right?

ii) have I set the epoch too large?

Subin Park
  • 201
  • 2
  • 8

1 Answers1

1

First of all your question is about p-value significant where they are significant againts all input data in scopes !

The idea is you may applied many of the functions or custom functions but avtivation layer is asynchornize or fairly chances based on your target.

  1. You can have model with 2-classes, 4-classes or 10 classes output to perform simiarlities significant or maximum, minumum, average maximum or last changes based on your selected function.

  2. Prediction is a result from your input and none sigficant, significant relationship learning develop.

  3. Compares of them possibile by make it into same ranges of expectation otherwises it is value for it subset.

sample output:

F:\temp\Python>python test_read_excel.py

   0  1  2  3  4  5
0  1  0  0  0  0  0
1  0  1  0  0  0  0
2  0  0  1  0  0  0
3  0  0  0  1  0  0
4  0  0  0  0  1  0
5  0  0  0  0  0  1
(6, 6)

none significant:

[array([[-0.6489598]], dtype=float32), array([-0.0998228], dtype=float32), array([[1.7546983e-05]], dtype=float32), array([-3.6847262e-06], dtype=float32)]

** sample code **

variables = pd.read_excel('F:\\temp\\20220305\\Book 2.xlsx', index_col=None, header=None)
list_of_X = [ ]
list_of_Y = [ ]

for i in range(np.asarray(variables).shape[0]):
    for j in range(np.asarray(variables).shape[1]):
        if variables[j][i] == "X" :
            print('found: ' + str(i) + ":" + str(j))
            list_of_X.append(i)
            list_of_Y.append(1)
        else :
            list_of_X.append(i)
            list_of_Y.append(0)


list_of_X = np.reshape(list_of_X, (1, 36, 1))
list_of_Y = np.reshape(list_of_Y, (1, 36, 1))

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(36, 1)),
    tf.keras.layers.Dense(1 , activation='sigmoid' ),
])

model.add(tf.keras.layers.Dense(1))
model.summary()

model.compile(loss = 'mean_squared_error',
          optimizer = 'adam',
          metrics = ['mean_squared_error'])
history = model.fit(list_of_X, list_of_Y, epochs=1000, batch_size=4)
General Grievance
  • 4,555
  • 31
  • 31
  • 45