Geohash string is a feature in my sparse logistic regression model. So I used java string hashCode to generate int value on geohash string in order to get feature id. But I found hashCode method performs badly on similar geohash strings. It cause different features has the same feature id which may be bad in model optimization even the feature is similar. For example, those similar geohash string pairs have the same hashCode.
<"wws8vw", "wws8x9">
"wws8vw".hashCode() = -774715770
"wws8x9".hashCode() = -774715770
<"wmxy0", "wmxwn">
"wmxy0".hashCode() = 113265337
"wmxwn".hashCode() = 113265337
I guess it has some relationship between the geohash generator method and java hashCode method. So, anyone can explain me the true reason and how to decrease collisions on geohash string?