A Case Study of Clustering Algorithms for Categorical Data sets

V. Sujitha, B.Venkateshwar Reddy, G.Vishnu Murthy


The data clustering, an unsupervised pattern recognition process is the task of assigning a set of objects into groups called clusters so that the objects in the same cluster are more similar to each other than to those in other clusters. Most traditional clustering algorithms are limited to handling numerical data. However, these cannot be directly applied for clustering of nominal data, where domain values are discrete and have no ordering. In this paper various categorical data clustering algorithms are going to be addressed in detail. A detailed survey on existing algorithms will be made and the scalability of some of the existing algorithms will be examined.


data clustering, categorical data, data mining, scalability


L. Kaufman and P.J. Rousseeuw, Finding Groups Data: An Introduction to Cluster Analysis. Wiley Publishers, 1990.

A.K. Jain and R.C. Dubes, Algorithms for Clustering. Prentice-Hall, 1998.

P. Zhang, X. Wang, and P.X. Song, “Clustering Categorical Data Based on Distance Vectors,” The J. Am. Statistical Assoc., vol. 101, no. 473, pp. 355-367, 2006.

J. Grambeier and A. Rudolph, “Techniques of Cluster Algorithms in Data Mining,” Data Mining and Knowledge Discovery, vol. 6, pp. 303-360, 2002.

K.C. Gowda and E. Diday, “Symbolic Clustering Using a New Dissimilarity Measure,” Pattern Recognition, vol. 24, no. 6, pp. 567- 578, 1991.

Z. Huang, “Extensions to the K-Means Algorithm for Clustering Large Data Sets with Categorical Values,” Data Mining and Knowledge Discovery, vol. 2, pp. 283-304, 1998.

Z. He, X. Xu, and S. Deng, “Squeezer: An Efficient Algorithm for Clustering Categorical Data,” J. Computer Science and Technology, vol. 17, no. 5, pp. 611-624, 2002.

D. Gibson, J. Kleinberg, and P. Raghavan, “Clustering Categorical Data: An Approach Based on Dynamical Systems,” VLDB J., vol. 8, nos. 3-4, pp. 222-236, 2000.

S. Guha, R. Rastogi, and K. Shim, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Information Systems, vol. 25,no. 5, pp. 345-366, 2000.

M.J. Zaki and M. Peters, “Clicks: Mining Subspace Clusters in Categorical Data via Kpartite Maximal Cliques,” Proc. Int’l Conf.Data Eng. (ICDE), pp. 355-356, 2005.

V. Ganti, J. Gehrke, and R. Ramakrishnan, “CACTUS: Clustering Categorical Data Using Summaries,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 73-83.

Barbara, Y. Li, and J. Couto, “COOLCAT: An Entropy-Based Algorithm for Categorical Clustering,” Proc. Int’l Conf. Information and Knowledge Management (CIKM), pp. 582-589, 2002.

Y. Yang, S. Guan, and J. You, “CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp

Full Text: PDF [FULL TEXT]


  • There are currently no refbacks.

Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.