Is there a ML regression model that can predict well for low data? (89 rows in the dataframe, 2 features and 1 label)

Question

Dummy Data:

UniqueNo.	Count19	Count20	Count21
ABC123	2	4	2
DEF456	1	3	3

The column 'UniqueNo' is the unique identifier for which the values correspond to. The columns 'Count19', 'Count20', 'Count21' are values corresponding to each UniqueNo. They are for the years of 2019, 2020, 2021 respectively.

The df is 89 rows long so the data is extremely low.

I need to use a model that can predict atleast over 80% for this.

I've tried LinearRegression, RandomForest, DecisionTree and LSTM but to no avail. (The rmse and mse eval metrics returned terrible values)

score 0 · Answer 1 · answered Apr 24 '23 at 14:22

Use the general linear model. It generalizes the bayesian theorm for small amounts of data to find probability.

Error: Perfect separation detected, results not available

Error : PerfectSeparationError: Perfect separation detected, results not available

data="""UniqueNo.   Year2000    Year2001    YearTarget
ABC123  2   4   2
DEF456  1   3   3
"""
df = pd.read_csv(io.StringIO(data), sep='\t')
print(df.columns)

model_formula='YearTarget~Year2000+Year2001'

model = glm(model_formula, data =df, family = sm.families.Poisson()).fit()

print(model.summary())

intercept, slope = model.params

# Print coefficients
print('Intercept =', intercept)
print('Slope =', slope)

# Extract and print confidence intervals
print(model.conf_int())

Is there a ML regression model that can predict well for low data? (89 rows in the dataframe, 2 features and 1 label)

1 Answers1