I have a machine learning problem in a logistic regression algorithm. That I have a data frame where some rows and features are repeated like the below table:
feature 1 | feature 2 | feature 3 | ... | feature n-1 | feature n | Target |
---|---|---|---|---|---|---|
a1 | a2 | a3 | .. | an | 1 | 1 |
b1 | b2 | b3 | .. | bn | 1 | 0 |
c1 | c2 | c3 | .. | cn | 1 | 1 |
.. | .. | .. | .. | .. | 1 | .. |
a1 | a2 | a3 | .. | an | 2 | .. |
b1 | b2 | b3 | .. | bn | 2 | .. |
c1 | c2 | c3 | .. | cn | 2 | .. |
.. | .. | .. | .. | .. | 2 | .. |
a1 | a2 | a3 | .. | an | 3 | .. |
b1 | b2 | b3 | .. | bn | 3 | .. |
c1 | c2 | c3 | .. | cn | 3 | .. |
.. | .. | .. | .. | .. | .. | .. |
Is it possible to occur overfitting or underfitting with this data frame or not?
And what about a data frame that has between 6 or 8 features with about 500 rows?
I should add and notice this, rows that are repeated in features from 1 to n-1 vary in feature n.