An Approach to Text Documents Clustering with {n, n-1, ….., 1}-Word(s) Appearance Using Graph Mining Techniques
Abstract
This paper is about text document clustering with an input of n words. Initially a cluster of all text documents with extension name ".Txt" from m-documents of various types is formed. Then on an input of n-words, the proposed algorithm starts n, n-1, n-2,.....,1 sets of cluster. Each cluster of text documents with the presence of n, n-1, n-2,......,1 word(s) respectively. These n-forms of clustering are treated as documents-words relation and in memory it is represented as un-oriented documents-words incidence matrix. Finally these un-oriented documents-words incidence matrices are represented as bi-partite graphs, since the bi-partite graph has two sets of nodes namely document and word. The proposed algorithm using graph mining techniques was implemented using C++ programming language and the result was satisfactory.
References
Bapuji Rao & B. K. Mishra, “An Approach to Clustering of Text Documents Using Graph Mining Techniquesâ€, IJRSDA, IGI Publishing, New York, Volume No. 4, Issue 1, Article 5, 2016.
C. Aggarwal & C. Zhai, “A Survey of Text Clustering Algorithmsâ€, Mining Text Data, Springer US, Pp. 77-128, 2012.
C. Aggarwal & P. S. Yu, “On Effective Conceptual Indexing and Similarity Search in Text Dataâ€, IEEE International Conference on Data Mining, San Jose, CA, Unites States, Pp. 3-10, 2001.
E. Horowitz, S. Sahani, & D. Mehta, Fundamentals of Data Structures in C++ (2nd Edition), University Press (India) Private Limited, Himayat Nagar, Hyderabad, AP-500029, India, 2013.
G. Salton, An Introduction to Modern Information Retrieval, McGraw-Hill, Inc. New York, NY, USA, 1983.
I. S. Dhillon & D. S. Modha, “Concept Decompositions for Large Sparse Text Data Using Clusteringâ€, Machine Learning, Volume No. 42(1-2), Pp. 143-175, 2001.
N. Slinim & N. Tishby, “Document Clustering Using Word Clusters via the Information Bottleneck Methodâ€, 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, Pp. 208-215, 2000.
T. Liu, S. Liu, Z. Chen, & W. Y. Ma, “An Evaluation on Feature Selection for Text Clusteringâ€, ICML, Volume No. 3, Pp. 488-495, 2003.
Y. Yang & J. O. Pedersen, “A Comparative Study on Feature Selection in Text Categorizationâ€, ICML, Volume No. 97, Pp. 412-420, 1997.
Refbacks
- There are currently no refbacks.
Copyright © 2013, All rights reserved.| ijseat.com
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.
Â