machine learning - Cluster centroids on simplekmeans clustering -
i trying interpret set of results gleaned running simplekmeans clustering on diabetes.arff data set.
http://i.stack.imgur.com/t4eho.jpg - link clustered instances (figure 1)
so far can understand clustered instances (figure 1) show 500 variables have been classified tested negative , 268 have been classified tested positive.
http://i.stack.imgur.com/x9szt.jpg - link groundtruth values (figure 2)
when these values compared groundtruth values, there not difference because correct clustering should have shown 500 classified tested negative , 268 classified tested positive. technically means simplekmeans clustering approach suited data set has correctly classified instances.
but not know how interpret information in cluster centroids table, under full data, cluster #0 , cluster 1 headings. tell data set (figure 1)?
you should drop class attribute before clustering. has predictive power, , consequence of this, clustering algorithm has strong bias prefer class attribute internally.
you can attribute removal in "preprocess" panel clicking "remove" button, or in "cluster" panel clicking "ignore attributes", selecting "class" attribute.
then cluster again. suggest start k = 2, number of unique values of "class" attribute. (and check if cluster assignments correspond original attribute, or else.)
by way, seems me not working on "glass" data set, on "diabetes" dataset.
Comments
Post a Comment