0

So im ussing NASA asteroids dataset with tenserflow.kerax for some university assignment

The first thing i wanted was to standardize the data so i use

(1)
df = dfprime
ss = StandardScaler()
df_scaled = df #df.iloc[:,:-1]
df_scaled = pd.DataFrame(ss.fit_transform(df_scaled),columns = df_scaled.columns)
df_scaled_prime = df_scaled
df = df_scaled

And that gave me my dataset with my last column (the boolean "danger") as 0.321961 for True and a negative for False

so i did

(2)
for i in range(len(df)):
if df.iloc[i,-1] >= 0: 
    df.iloc[i,-1] = True
else:
    df.iloc[i,-1] = False

strangely df_scaled_prime and df_scaled also change their "danger" statements to True or False after this.

well here is where the trouble started, i i did't do the (2) all my model would't work and would say all are True

(3)
trainData, testData = train_test_split(df_scaled, test_size=0.3)
inputLayer = layers.Input(shape=(12,))
outputLayer = layers.Dense(1,activation='sigmoid',use_bias = True)(inputLayer)
perceptron = models.Model(inputLayer, outputLayer)
values = pd.DataFrame([{"loss":0, "binary_accuracy":0, "true_positives":0, 
"false_positives":0, "true_negatives":0, "false_negatives":0, "f1 score":0}])



perceptron.compile(loss='BinaryCrossentropy', optimizer = 'sgd',metrics=[metrics.BinaryAccuracy(),metrics.TruePositives(),metrics.FalsePositives(), metrics.TrueNegatives(),metrics.FalseNegatives()])
    history = perceptron.fit(trainData.iloc[:,0:-1], trainData.iloc[:,-1]==True,epochs=500,batch_size=1024,shuffle=True,verbose = 0)
    asd = perceptron.evaluate(testData.iloc[:,0:-1],testData.iloc[:,-1]==True)
    results=perceptron.predict(testData.iloc[:,0:-1],verbose=0)

well that give me desirable and logical results the problem is when i do this

(4)
df = dfprime.iloc[:, [6,7,8,12]]
#here i want to train other model with just 3 variables, not the 12 of before

trainData, testData = train_test_split(df, test_size=0.3)
inputLayer = layers.Input(shape=(3,))
outputLayer = layers.Dense(1,activation='sigmoid',use_bias = True)(inputLayer)
perceptron = models.Model(inputLayer, outputLayer)

perceptron.compile(loss='BinaryCrossentropy', optimizer = 'sgd',metrics=[metrics.BinaryAccuracy(),metrics.TruePositives(),metrics.FalsePositives(), metrics.TrueNegatives(),metrics.FalseNegatives()])
history = perceptron.fit(trainData.iloc[:,0:-1], trainData.iloc[:,-1]==1,epochs=500,batch_size=1024,shuffle=True,verbose = 0)
asd = perceptron.evaluate(testData.iloc[:,0:-1],testData.iloc[:,-1]==1)
results=perceptron.predict(testData.iloc[:,0:-1],verbose=0)

This give me true_positives_36: 26911 false_positives_36: 2734 true_negatives_36: 0.0000e+00 false_negatives_36: 0.0000e+00

that is crearly not correct obviously, in every iteration is the same

when i dont scale the model this doesn't happend since i dont touch the "danger" column, but since i want more accuracy on the model by scaling i cant give up that.\

i use more than a day to solve the first model bc that model also use to give 0.0000e+00 for TN and FN, but since i implement the (5) it stop happening (for the first model)

(5)
for i in range(len(df)):
    if df.iloc[i,-1] >= 0: 
        df.iloc[i,-1] = True
    else:
        df.iloc[i,-1] = False

someone have any idea of what the hell is happening? im at my wints end with this, is driving me insane since i dont understand what is happening

  • I doubt this will solve your problem, but you should definitely scale the training and testing data separately. If you scale the entire data set first, then you're using information to change the training data based on information that comes from the testing data (this is called data leakage) – Derek O Nov 30 '21 at 23:09
  • how would be a good way to do that? maybe it helps – Diego Pavez Verdi Nov 30 '21 at 23:20
  • You can separate your `df` into `df_train` and `df_test` and run the scaler on those individually – Derek O Nov 30 '21 at 23:29

0 Answers0