Currently I am using empirical methods to convince the world that we have a clustering approach that is superior in some important ways to the methods it promises to replace. Next I plan to apply the method to natural data, specifically whale songs.
Other interests include the use of geometric models for concurrency to avoid unsafe regions, computational techniques to discover derivatives market inefficiencies, and static analysis of cryptographic protocols.
Current projects
These project descriptions are from the lab website:1. Clustering Algorithms: I am working on issues of robustness and efficiency.
2. Whale Song Analysis: My true passion is the analysis of Whale Songs. I am applying the clustering methods I develop to this problem.
More about the clustering project
Here is a cool video depicting a clustering algorithm I am working with. It was introduced by Still, et. al. Geometric Clustering using the Information Bottleneck method.
The widely used K-means algorithm suffers from the problem of getting stuck in local optima. Methods such as deterministic annealing solve this problem, but can be computationally expensive. The algorithm I am working with is an alternative that may be significantly faster. I am currently testing the speed and comparing to DA, using various data sets.
(Parameters in the video: 4 clusters, 2500 points; random initial centers are blue, solution centers are red. λ=0.5. Initial variance=200.)
Initial cluster centers are drawn at random, as in K-means. However, the cluster locations are distributed about those means in a probabilistic way, according to a distribution that is exponential in the distance from the center. In the case of Euclidean distance, the variance of this distribution is one of the two parameters of the algorithm. The larger the initial variance, the more stable the algorithm becomes regardless of the choice of initial cluster centers. Due to the nature of the algorithm, this initial spread translates into fuzziness in the assignments of data to clusters in the first iteration. In DA, the fuzziness of the assignments is reduced gradually. This algorithm, in contrast, rapidly reduces the fuzziness; hence the name "quenched" K-means. The radii of the circles in the animation are equivalent to the variance discussed above; notice the quick reduction. In the video, data point cluster membership probabilities are encoded in color, making this fun to watch. The video repeats the algorithm on the same data two times for easier visualization.