10

I want to fuzzy cluster a set of jobs. Jobs Attributes are:

  1. Categorical: position,diploma, skills
  2. Numerical : salary , years of experience

My question is: how to calculate the distance between different jobs?
e.g job1(programmer,bs computer science,(java ,.net,responsibility),1500, 3)
and job2(tester,bs computer science,(black and white box testing),1200,1)

PS: I'm beginner in data mining clustering, I highly appreciate your help.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Mariya
  • 847
  • 1
  • 9
  • 25

2 Answers2

3

You may take this as your starting point: http://www.econ.upf.edu/~michael/stanford/maeb4.pdf. Distance between categorical data is nicely explained at the end.

iinception
  • 1,945
  • 2
  • 21
  • 19
2

Here is a good walk-through of several different clustering methods and how to use them in R: http://biocluster.ucr.edu/~tgirke/HTML_Presentations/Manuals/Clustering/clustering.pdf

In general, clustering for discrete data is related to either the use of counts (e.g. overlaps in vectors) or related to some statistic derived from counts. As much as I'd like to address the statistical side, I suppose you're interested in the algorithm, so I'll leave it at that.

Iterator
  • 20,250
  • 12
  • 75
  • 111