An Approach to Text Documents Clustering with {n, n-1, â€¦.., 1}-Word(s) Appearance Using Graph Mining Techniques

Bapuji Rao; Saroja Nanda Mishra

An Approach to Text Documents Clustering with {n, n-1, â€¦.., 1}-Word(s) Appearance Using Graph Mining Techniques

Bapuji Rao, Saroja Nanda Mishra

Abstract

This paper is about text document clustering with an input of n words. Initially a cluster of all text documents with extension name ".Txt" from m-documents of various types is formed. Then on an input of n-words, the proposed algorithm starts n, n-1, n-2,.....,1 sets of cluster. Each cluster of text documents with the presence of n, n-1, n-2,......,1 word(s) respectively. These n-forms of clustering are treated as documents-words relation and in memory it is represented as un-oriented documents-words incidence matrix. Finally these un-oriented documents-words incidence matrices are represented as bi-partite graphs, since the bi-partite graph has two sets of nodes namely document and word. The proposed algorithm using graph mining techniques was implemented using C++ programming language and the result was satisfactory.

References

Bapuji Rao & B. K. Mishra, â€œAn Approach to Clustering of Text Documents Using Graph Mining Techniquesâ€, IJRSDA, IGI Publishing, New York, Volume No. 4, Issue 1, Article 5, 2016.

C. Aggarwal & C. Zhai, â€œA Survey of Text Clustering Algorithmsâ€, Mining Text Data, Springer US, Pp. 77-128, 2012.

C. Aggarwal & P. S. Yu, â€œOn Effective Conceptual Indexing and Similarity Search in Text Dataâ€, IEEE International Conference on Data Mining, San Jose, CA, Unites States, Pp. 3-10, 2001.

E. Horowitz, S. Sahani, & D. Mehta, Fundamentals of Data Structures in C++ (2nd Edition), University Press (India) Private Limited, Himayat Nagar, Hyderabad, AP-500029, India, 2013.

G. Salton, An Introduction to Modern Information Retrieval, McGraw-Hill, Inc. New York, NY, USA, 1983.

I. S. Dhillon & D. S. Modha, â€œConcept Decompositions for Large Sparse Text Data Using Clusteringâ€, Machine Learning, Volume No. 42(1-2), Pp. 143-175, 2001.

N. Slinim & N. Tishby, â€œDocument Clustering Using Word Clusters via the Information Bottleneck Methodâ€, 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, Pp. 208-215, 2000.

T. Liu, S. Liu, Z. Chen, & W. Y. Ma, â€œAn Evaluation on Feature Selection for Text Clusteringâ€, ICML, Volume No. 3, Pp. 488-495, 2003.

Y. Yang & J. O. Pedersen, â€œA Comparative Study on Feature Selection in Text Categorizationâ€, ICML, Volume No. 97, Pp. 412-420, 1997.

Full Text: PDF [Full Text]

Refbacks

There are currently no refbacks.

International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.

Username
Password
Remember me

An Approach to Text Documents Clustering with {n, n-1, â€¦.., 1}-Word(s) Appearance Using Graph Mining Techniques

Abstract

References

Refbacks

Copyright Â© 2013, All rights reserved.| ijseat.com