|
|
RSKC: An R package for a robust and sparse k-means clustering algorithm
|
|
|
|
|
نویسنده
|
kondo y. ,salibian-barrera m. ,zamar r.
|
منبع
|
journal of statistical software - 2016 - دوره : 72 - شماره : 0
|
چکیده
|
Witten and tibshirani (2010) proposed an algorithim to simultaneously find clusters and select clustering variables,called sparse k-means (sk-means). sk-means is particularly useful when the dataset has a large fraction of noise variables (that is,variables without useful information to separate the clusters). sk-means works very well on clean and complete data but cannot handle outliers nor missing data. to remedy these problems we introduce a new robust and sparse k-means clustering algorithm implemented in the r package rskc. we demonstrate the use of our package on four datasets. we also conduct a monte carlo study to compare the performances of rsk-means and sk-means regarding the selection of important variables and identification of clusters. our simulation study shows that rsk-means performs well on clean data and better than sk-means and other competitors on outlier-contaminated data. © 2016,american statistical association. all rights reserved.
|
کلیدواژه
|
K-means; Robust clustering; Sparse clustering; Trimmed k-means
|
آدرس
|
university of british columbia, Canada, university of british columbia, Canada, university of british columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Authors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|