An Approach to Text Documents Clustering with {n, n-1, ….., 1}-Word(s) Appearance Using Graph Mining Techniques

Bapuji Rao, Saroja Nanda Mishra

Abstract


This paper is about text document clustering with an input of n words. Initially a cluster of all text documents with extension name ".Txt" from m-documents of various types is formed. Then on an input of n-words, the proposed algorithm starts n, n-1, n-2,.....,1 sets of cluster. Each cluster of text documents with the presence of n, n-1, n-2,......,1 word(s) respectively. These n-forms of clustering are treated as documents-words relation and in memory it is represented as un-oriented documents-words incidence matrix. Finally these un-oriented documents-words incidence matrices are represented as bi-partite graphs, since the bi-partite graph has two sets of nodes namely document and word. The proposed algorithm using graph mining techniques was implemented using C++ programming language and the result was satisfactory.


References


Bapuji Rao & B. K. Mishra, “An Approach to Clustering of Text Documents Using Graph Mining Techniques”, IJRSDA, IGI Publishing, New York, Volume No. 4, Issue 1, Article 5, 2016.

C. Aggarwal & C. Zhai, “A Survey of Text Clustering Algorithms”, Mining Text Data, Springer US, Pp. 77-128, 2012.

C. Aggarwal & P. S. Yu, “On Effective Conceptual Indexing and Similarity Search in Text Data”, IEEE International Conference on Data Mining, San Jose, CA, Unites States, Pp. 3-10, 2001.

E. Horowitz, S. Sahani, & D. Mehta, Fundamentals of Data Structures in C++ (2nd Edition), University Press (India) Private Limited, Himayat Nagar, Hyderabad, AP-500029, India, 2013.

G. Salton, An Introduction to Modern Information Retrieval, McGraw-Hill, Inc. New York, NY, USA, 1983.

I. S. Dhillon & D. S. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering”, Machine Learning, Volume No. 42(1-2), Pp. 143-175, 2001.

N. Slinim & N. Tishby, “Document Clustering Using Word Clusters via the Information Bottleneck Method”, 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, Pp. 208-215, 2000.

T. Liu, S. Liu, Z. Chen, & W. Y. Ma, “An Evaluation on Feature Selection for Text Clustering”, ICML, Volume No. 3, Pp. 488-495, 2003.

Y. Yang & J. O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization”, ICML, Volume No. 97, Pp. 412-420, 1997.


Full Text: PDF [Full Text]

Refbacks

  • There are currently no refbacks.


Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.