Scaling with Kmeans Clustering

Asked Oct 20 '22 at 18:15

Active Oct 20 '22 at 18:15

Viewed 76 times

I have a clustering problem I'd like to solve and I'm wondering if scaling is recommended for the way my data is structured.

Below is a hypothetical problem which should be relatable to my actual use case. Say we're looking at data from a grocery, and my features represent the percentage of a customer's order among different categories of the grocery. My ultimate goal is to cluster the customers based on similar buying habits.

My question is, given that the feature values are relative to one another (percentages) and my rows add to 100% - is scaling still beneficial with this data structure, or will it actually reduce the accuracy of my results?

Sample data structure

asked Oct 20 '22 at 18:15

D Note

Found a useful answer in this post https://stats.stackexchange.com/questions/372521/in-cluster-analysis-should-i-scale-standardize-my-data-if-variables-are-in-the/372541#372541 – D Note Oct 20 '22 at 18:54

Scaling with Kmeans Clustering

0 Answers0