I'm a newbie to python and would very much appreciate some assistance.
It's about logistic regression (machine learning) I have no problem up until training the algorithm.
The data sets are as follows:
The cost_train dataframe contains the target variable, 0 and 1 binary classification.
cost_train =..
(13900 observations)
cost_test =...
(5400 observations)
invoices_train =..
(6000000 observations)
invoices_test =...
(105000 observations)
So in short there is no need to apply a train_test_split. My first idea was to merge the other 3 dataframes with the cost_train data frame, but after a few days of struggling I saw it was not going to work.
I will very much appreciate any advise or solutions.