0

This is general question to understand a concept.

I have a dataframe with all columns having float values(precision varies from 2 to 8 digits).

I use GBM to train my model. When i train my model with all float values - r2 score -0.78

Same when all columns are converted to integer values - r2 score -0.72

Why does r2 score drop when float is converted to integer ?

Is it something very specific to my data or in general is it expected to drop ?

CodeTry
  • 312
  • 1
  • 19
  • With no knowledge of your data or model, this implies that the digits matter for the way that you have your current model. Using integers means that you are rounding (or truncating) those 2 to 8 digits. You can further test this by looking at the distribution of the variables and the scatterplots between the model ind and dep vars. Also, this question may be best asked on [cross-validated](https://stats.stackexchange.com/) – jessi Feb 27 '20 at 17:03
  • It could increase under certain circumstances; you're fitting different data with a very non-linear transformation. If the *range* of your values is on the order of 1, then this rounding to the nearest integer *can* have more of an impact as you're changing the values at the level of the variation within your data. – ALollz Feb 27 '20 at 17:18

1 Answers1

1

The problem in your case is that you might be losing resolution while converting to integers. A good metric would be to measure the quantization or rounding error in your inputs while converting from float to integers. This link has a very basic explanation for rounding error. Rounding error might be a useful metric to compare the accuracy drop with respect to precision and number systems.

Anurag Reddy
  • 1,159
  • 11
  • 19