I've previously splitted my bigdata:
# X_train.shape : 4M samples x 2K features
# X_test.shape : 2M samples x 2K features
I've prepared the dataloaders
target = torch.tensor(y_train.to_numpy())
features = torch.tensor(X_train.values)
train = data_utils.TensorDataset(features, target)
train_loader = data_utils.DataLoader(train, batch_size=10000, shuffle=True)
testtarget = torch.tensor(y_test.to_numpy())
testfeatures = torch.tensor(X_test.values)
test = data_utils.TensorDataset(testfeatures, testtarget)
validation_generator = data_utils.DataLoader(test, batch_size=20000, shuffle=True)
I copied from an online course this example for a network (no idea if other model are better)
base_elastic_model = ElasticNet()
param_grid = {'alpha':[0.1,1,5,10,50,100],
'l1_ratio':[.1, .5, .7, .9, .95, .99, 1]}
grid_model = GridSearchCV(estimator=base_elastic_model,
param_grid=param_grid,
scoring='neg_mean_squared_error',
cv=5,
verbose=0)
I've built this fitting
for epoch in range(1):
# Training
cont=0
total = 0
correct = 0
for local_batch, local_labels in train_loader:
cont+=1
with torch.set_grad_enabled(True):
grid_model.fit(local_batch,local_labels)
with torch.set_grad_enabled(False):
predicted = grid_model.predict(local_batch)
total += len(local_labels)
correct += ((1*(predicted>.5)) == np.array(local_labels)).sum()
#print stats
# Validation
total = 0
correct = 0
with torch.set_grad_enabled(False):
for local_batch, local_labels in validation_generator:
predicted = grid_model.predict(local_batch)
total += len(local_labels)
correct += ((1*(predicted>.5)) == np.array(local_labels)).sum()
#print stats
Maybe my grandchildren will have the results for 1 epoch!
I need some advises:
- how/where (in the code) can I use quickly less data for a first tuning?
- some advise for the steps to have a result in the 2022?
- because I've added "with torch.set_grad_enabled(False):" for stats printing, have I to add (as done) "with torch.set_grad_enabled(True):" ?
- I have got a GPU (useful without images??). I've the function "get_device()". Where have I to put ".to(get_device())" to use CUDA?
- I'm learning putting together pieces of information, do you have general advising for my exercise?