0

I'm a newbie to python and would very much appreciate some assistance.

It's about logistic regression (machine learning) I have no problem up until training the algorithm.

The data sets are as follows:

The cost_train dataframe contains the target variable, 0 and 1 binary classification.

cost_train =..
(13900 observations)
cost_test =... 
(5400 observations)
invoices_train =..
(6000000 observations)
invoices_test =...
(105000 observations)

So in short there is no need to apply a train_test_split. My first idea was to merge the other 3 dataframes with the cost_train data frame, but after a few days of struggling I saw it was not going to work.

I will very much appreciate any advise or solutions.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Jay
  • 1

1 Answers1

1

First of all i assume invoices_train & invoices_test are your feature sets since it is not mentioned. You can use pandas concat() & merge() functions to combine all 4 data frames. But you must have same number of rows in your feature set and label set. Otherwise there will be null values in your label set. You can first concat invoices_train & invoices_test to a single data set as X using concat(). Then concat cost_train & cost_test to another single data set as y. Then you can merge X & y using merge(). This is one approach.For more details visit pandas documentation.

But since you're using sklearn train_test_split , you don't need to merge X & y because you can provide X, y directly as parameters to the function.

thilakshiK
  • 735
  • 2
  • 6
  • 11