Are the labels informative enough? – Semi–Supervised Probabilistic Distance Clustering and the Uncertainty of Classification
Doç. Dr. Cem İyigün
Industrial Engineering, METU
Dec 15, Friday 13:40
In this study we first discuss unsupervised and semi-supervised clustering and then focus on the latter one. Semi–supervised clustering is an attempt to reconcile clustering (unsupervised learning) and classification (supervised learning, using prior information on the data.) These two modes of data analysis are combined in a parameterized model. The results (cluster centers, classification rule) depend on the parameter θ, an insensitivity to θ indicates that the prior information is in agreement with the intrinsic cluster structure, and is otherwise redundant. This explains why some data sets in the literature give good results for all reasonable classification methods. The uncertainty of classification is represented here by the geometric mean of the membership probabilities, shown to be an entropic distance related to the Kullback–Leibler divergence.
Brief bio of the speaker
Cem Iyigun is an Asssociate Professor in the Industrial Engineering Department at Middle East Technical University (METU). Prior to joining METU in 2009, he worked as a visiting assistant professor in Management Science and Information Systems department at Rutgers Business School. He received his Ph.D. in 2007 from Rutgers Center for Operations Research (RUTCOR) at Rutgers University. His research interests lies primarily in data mining problems. He works on clustering, classification algorithms, time series clustering and statistical models for classification problems with the applications on bioinformatics, climatology and electricity load forecasting. His research also includes continuous facility location problems and hub location problems.