I would like to know how I can cluster a multivariate dataset using K-means. Each sample in this dataset corresponds to a Person (I have 6000 people), and each Person has both continuous and discrete attributes (10 attributes/Person). An example:
- person_id: 1234
- name: "John Doe"
- age: 30
- height: '5 ft 10 in'
- salary_value: 5000
- Salary_currency: USD
- is_customer: False
- Company: "Testing Inc."
- ...
I have read an existing answer on multidimensional k-means clustering, but the attributes in the dataset there are all continuous. Even a more helpful reading was a post about clustering algorithm for continuous and discrete variables. As mentioned in the latter, I accept I may have to find a function that values discrete states. But I cannot use ROCK or COBWEB for clustering, only k-means.
Which functions can I use to convert the discrete values to continuous ones? Furthermore, is there any way I can prioritize the attributes also (say clustering based on Salary/Age is more important than height), or should I just revamp the whole approach?