統計学輪講 第09回

日時 2024年06月18日(火)
14時55分 ~ 16時35分
場所 経済学部新棟3階第3教室
講演者 徳田 智磯 (地震研究所)
演題 Multiple clustering based on nonparametric mixture models for Gaussian and Wishart distributions
概要

For high-dimensional data, it is not straightforward to cluster objects because all features are not always relevant for a single cluster solution. That is, some features may be relevant for one cluster solution, whereas other features relevant for another cluster solution. In general, in high-dimensional case, one may reasonably assume multiple cluster solutions depending on a specific subset of features. In such a situation, a conventional clustering method breaks down. Despite this, effective methods to find such multiple cluster structures have been less developed. In this talk, I discuss two clustering methods, which are useful to reveal such multiple clustering data structure. A first method is based on Gaussian mixture models in which features are partitioned into subsets. For each subset of features, a cluster solution is in turn estimated. Both feature partition and multiple clustering solutions are simultaneously optimized, with the number of subsets and the number of clusters being inferred by the Dirichlet process. A second method is based on Wishart mixture models, which applies to correlation matrices of connectivity data without vectorization. Multiple clustering solutions are based on network of nodes, optimized in a data-driven manner. It can identify the underlying pairs of associations between sub-network of nodes and a cluster solution. Finally, I discuss three examples of real data applications: two applications to neuroscience and one to seismology. These applications demonstrate the usefulness and power of those multiple clustering methods for real data analysis.