0

Dummy Data:

UniqueNo. Count19 Count20 Count21
ABC123 2 4 2
DEF456 1 3 3

The column 'UniqueNo' is the unique identifier for which the values correspond to. The columns 'Count19', 'Count20', 'Count21' are values corresponding to each UniqueNo. They are for the years of 2019, 2020, 2021 respectively.

The df is 89 rows long so the data is extremely low.

I need to use a model that can predict atleast over 80% for this.

I've tried LinearRegression, RandomForest, DecisionTree and LSTM but to no avail. (The rmse and mse eval metrics returned terrible values)

desertnaut
  • 57,590
  • 26
  • 140
  • 166
IounmS
  • 19
  • 1
  • 3

1 Answers1

0

Use the general linear model. It generalizes the bayesian theorm for small amounts of data to find probability.

Error: Perfect separation detected, results not available

Error : PerfectSeparationError: Perfect separation detected, results not available

data="""UniqueNo.   Year2000    Year2001    YearTarget
ABC123  2   4   2
DEF456  1   3   3
"""
df = pd.read_csv(io.StringIO(data), sep='\t')
print(df.columns)

model_formula='YearTarget~Year2000+Year2001'

model = glm(model_formula, data =df, family = sm.families.Poisson()).fit()

print(model.summary())

intercept, slope = model.params

# Print coefficients
print('Intercept =', intercept)
print('Slope =', slope)

# Extract and print confidence intervals
print(model.conf_int())
Golden Lion
  • 3,840
  • 2
  • 26
  • 35