Hierarchical Clustering
Hierarchical Clustering groups (Agglomerative or also called as Bottom-Up Approach) or divides (Divisive or also called as Top-Down Approach) the clusters based on the distance metrics. In Agglomerative clustering, each data point acts as a cluster initially, and then it groups the clusters one by one.
Divisive is the opposite of Agglomerative, it starts off with all the points into one cluster and divides them to create more clusters. These algorithms create a distance matrix of all the existing clusters and perform the linkage between the clusters depending on the criteria of the linkage. The clustering of the data points is represented by using a dendrogram. There are different types of linkages: –
o Single Linkage: — In single linkage the distance between the two clusters is the shortest distance between points in those two clusters.
o Complete Linkage: — In complete linkage, the distance between the two clusters is the farthest distance between points in those two clusters.
o Average Linkage: — In average linkage the distance between the two clusters is the average distance of every point in the cluster with every point in another cluster.
Both this algorithm are exactly reverse of each other. So we will be covering Agglomerative Hierarchical clustering algorithm in detail.
How Agglomerative Hierarchical clustering Algorithm Works
For a set of N observations to be clustered:
- Start assigning each observation as a single point cluster, so that if we have N observations, we have N clusters, each containing just one observation.
- Find the closest (most similar) pair of clusters and make them into one cluster, we now have N-1 clusters. This can be done in various ways to identify similar and dissimilar measures.
- Find the two closest clusters and make them to one cluster. We now have N-2 clusters. This can be done using agglomerative clustering linkage techniques
- Repeat steps 2 and 3 until all observations are clustered into one single cluster of size N.
Clustering algorithms use various distance or dissimilarity measures to develop different clusters. Lower/closer distance indicates that data or observation are similar and would get grouped in a single cluster. Remember that the higher the similarity depicts observation is similar.
Step 2 can be done in various ways to identify similar and dissimilar measures. Namely,
- Euclidean Distance
- Manhattan Distance
- Minkowski Distance
- Jaccard Similarity Coefficient
- Cosine Similarity
- Gower’s Similarity Coefficient