Clustering
The important parameters associated with clustering are given below.
- Similarity Measure: $\rho(d_1, d_2)$
- Distance Measure: $\delta(d_1, d_2)$
- Number of clusters: $k$
In the same cluster, we would like $\delta$ to be small and $\rho$ to be large. This is called “Inter-Cluster”. Similarly, we would like large $\delta$ and small $\rho$ between two different clusters. (“Intra-Cluster”)
There are various methods of clustering the data. Some of these have been discussed below.
Bottom-Up Clustering
Simply put, the pair of points with the smallest distance between them are “merged” into a cluster at every iteration. A visual representation of this is called as a Dendogram.
Top-Down Clustering
We initialise $k$ arbitrary centroids, and iterate through the data and modify these centroids appropriately. We have a semblence of cluster representative in top-down, wheras bottom-up is just peer-to-peer in nature.
An idea is to perform bottom-up clustering until $k$ clusters are obtained, and using the means as the initial centroids for performing top-down clustering.
K-Means Algorithm
We will be dealing with the “hard” version of the K-Means algorithm. There are two main steps for this algorithm.
- Keeping datapoints’ assignments the same, update the position of cluster center to the empirical mean.
- Fix the cluster centers and assign datapoints to cluster with least euclidean distance.
The algorithm is terminated when the assignment of clusters to each datapoint is unchanged or when the cluster centers change by a very small value.
Kernel K-Means
The euclidean distance calculated in the normal K-Means algorithm is modified a little.
\[\begin{align} d(x,y) &= || \phi(x) - \phi(y) ||^2 \\ &= ||\phi(x)||^2 + ||\phi(y)||^2 - 2\phi(x)^T\phi(y) \\ &= K(x,x) + K(y,y) - 2K(x,y) \\ \end{align}\]