1

I have a data set which consists of data points having attributes like:

  • average daily consumption of energy
  • average daily generation of energy
  • type of energy source
  • average daily energy fed in to grid
  • daily energy tariff

I am new to clustering techniques.

So my question is which clustering algorithm will be best for such kind of data to form clusters ?

Uri Goren
  • 13,386
  • 6
  • 58
  • 110
Keya Patel
  • 29
  • 3

4 Answers4

0

I think hierarchical clustering is a good choice. Have a look here Clustering Algorithms

Chris
  • 1,692
  • 2
  • 17
  • 21
0

The more simple way to do clustering is by kmeans algorithm. If all of your attributes are numerical, then this is the easiest way of doing the clustering. Even if they are not, you would have to find a distance measure for caterogical or nominal attributes, but still kmeans is a good choice. Kmeans is a partitional clustering algorithm... i wouldn't use hierarchical clustering for this case. But that also depends on what you want to do. you need to evaluate if you want to find clusters within clusters or they all have to be totally apart from each other and not included on each other.

Take care.

0

1) First, try with k-means. If that fulfills your demand that's it. Play with different number of clusters (controlled by parameter k). There are a number of implementations of k-means and you can implement your own version if you have good programming skills.

K-means generally works well if data looks like a circular/spherical shape. This means that there is some Gaussianity in the data (data comes from a Gaussian distribution).

2) if k-means doesn't fulfill your expectations, it is time to read and think more. Then I suggest reading a good survey paper. the most common techniques are implemented in several programming languages and data mining frameworks, many of them are free to download and use.

3) if applying state-of-the-art clustering techniques is not enough, it is time to design a new technique. Then you can think by yourself or associate with a machine learning expert.

user11924
  • 153
  • 7
0

Since most of your data is continuous, and it reasonable to assume that energy consumption and generation are normally distributed, I would use statistical methods for clustering.

Such as:

The advantage of these methods over metric-based clustering algorithms (e.g. k-means) is that we can take advantage of the fact that we are dealing with averages, and we can make assumptions on the distributions from which those average were calculated.

Uri Goren
  • 13,386
  • 6
  • 58
  • 110