日時 2008年12月2日(火) 15時〜15時50分 場所 経済学部新棟3階第3教室 講演者 Muni S. Srivastava (University of Toronto) 演題 Comparison of Discrimination Methods for High Dimensional Data 概要 Dudoit, Fridlyand and Speed (2002) compare several discrimination methods for the classification of tumors using gene expression data. The comparison includes the Fisher (1936) linear discrimination analysis method (FLDA), the classification and regression tree (CART) method of Breiman, Friedman, Olshen and Stone (1984), aggregating classifiers of Breiman (1996, 98) which include “bagging” methods of Friedman (1998), the “boosting” method of Freund and Schapire (1997), and the nearest neighbour, called NN method of Fix and Hodges (1951). The comparison also included two more methods called DQDA method and DLDA method respectively. In the DQDA method, it is assumed that the population covariances are diagonal matrices, but unequal for different groups. The likelihood ratio rule is obtained assuming that the parameters are known, and then estimates are substituted in the likelihood ratio rule. On the other hand, in the DLDA method, it is assumed that the population covariances are not only diagonal matrices, but they are also all equal and the rule is obtained in the same manner as in DQDA. However, among all the preceding methods considered by Dudoit, et al. (2002), only two methods, namely the DLDA and NN methods, performed well. However, the NN method is very computer intensive and performs no better than the DLDA method, especially when classifying into only two populations, and thus it will not be included in our study. While it is not possible to give reasons as to why other methods did not perform well, the poor performance of the FLDA method may be due to the large dimension p of the data even when the degrees of freedom n associated with the sample covariance is larger than p. In large dimensions, the sample covariance may become near singular with very small eigenvalues. For this reason, it may be reasonable to consider a version of the principal component method which is applicable even when p n. Using the Moore-Penrose inverse, a general method based on the minimum distance rule is proposed. Another method which uses an empirical Bayes estimate of the inverse of the covariance matrix, along with a variation of this method are also proposed. We compare these three new methods with the DLDA method of Dudoit, et al. (2002).