This is a sample of dataset I'm using to look at lapsed customers. I've converted categorical values to be numbers. However I believe that sklearn random forest will treat these fields as discrete numbers e.g. assume that customer number 4 is double that of customer number 2? Do I need to cross-tab or vectorize these values before applying my random forest model?
Lapse_Flag,Cust,Sales,Cust Age,State,Main Sales Territory 0,1,28.46,3,1,1 0,2,46.07,3,2,1 0,3,108.48,3,3,2 1,4,265,3,4,3 0,5,54.42,3,5,4 0,6,0,1,6,3 0,7,371.93,3,7,5 1,8,35.6,3,8,6 1,9,357.95,2,9,7 0,10,5584.14,3,5,4 0,11,41207.02,3,10,4 0,12,5958.18,3,5,4 0,13,1028.14,1,11,7 0,14,446.67,2,7,5 0,15,0,3,1,1 0,16,6256,2,12,7 0,17,4618.72,3,2,1 1,18,275.58,3,12,2 1,19,1417.22,2,8,6