I want to implement a rough c means clustering algorithm but I have no prior experience in clustering so I'm wondering if I need to do some pre processing to the data to make it usable for clustering.
For example let's say I have a csv file with a lot of attributes, some numeric, some strings.
IN order for me to apply rough c means clusering (or any other kind of clusering), should I apply other rough methods like attribute selection, rule discovery, discretization, do the lower/upper approximations?
What would be the normal flow of a set of mixed data for clustering? What would the data go through if I were to use a rough set approach algorithm for clustering?
Is there a certain order in which things are supposed to happen? I tried looking up for this information but I couldn't find it anywhere clearly stated.
ANy ideas? Or how could I make my question more clear in order to get an answer cause I can't find anything that would help me get started with clustering data and I dont see how clustering raw data would help me
rank discipline yrs.since.phd yrs.service sex salary
1 Prof B 19 18 Male 139750
2 Prof B 20 16 Male 173200
3 AsstProf B 4 3 Male 79750
4 Prof B 45 39 Male 115000
5 Prof B 40 41 Male 141500
6 AssocProf B 6 6 Male 97000
7 Prof B 30 23 Male 175000
8 Prof B 45 45 Male 147765
9 Prof B 21 20 Male 119250
10 Prof B 18 18 Female 129000
11 AssocProf B 12 8 Male 119800
12 AsstProf B 7 2 Male 79800
13 AsstProf B 1 1 Male 77700
14 AsstProf B 2 0 Male 78000
15 Prof B 20 18 Male 104800
16 Prof B 12 3 Male 117150
17 Prof B 19 20 Male 101000
18 Prof A 38 34 Male 103450
19 Prof A 37 23 Male 124750
20 Prof A 39 36 Female 137000