A New Clustering Technique On Text In Sentence For Text Mining

B Lakshmmi Narayana, S Phani Kumar

Abstract


Clustering is a commonly considered data mining problem in the text domains. The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In this paper, the sentence level based clustering algorithm is discussed as a survey. The survey explains about the problems in clustering in sentence level and the solutions to overcome these problems. This paper presents a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pairwise similarities between data objects Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA) is extension of FRECCA which is used for the clustering of sentences. Contents present in text documents contain hierarchical structure and there are many terms present in the documents which are related to more than one theme hence HFRECCA will be useful algorithm for natural language documents. In this algorithm single object may belong to more than one cluster.


References


V. Hatzivassiloglou, J.L. Klavans, M.L. Holcombe, R. Barzilay, M. Kan, and K.R. McKeown, “SIMFINDER: A Flexible Clustering Tool for Summarization,” Proc. NAACL Workshop Automatic Summarization, pp. 41-49, 2001.

H. Zha, “Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering,” Proc. 25th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 113-120, 2002.

D.R. Radev, H. Jing, M. Stys, and D. Tam, “Centroid-Based Summarization of Multiple Documents,” Information Processing and Management: An Int’l J., vol. 40, pp. 919-938, 2004.

R.M. Aliguyev, “A New Sentence Similarity Measure and Sentence Based Extractive Technique for Automatic Text Summarization,” Expert Systems with Applications, vol. 36, pp. 7764-7772, 2009.

R. Kosala and H. Blockeel, “Web Mining Research: A Survey,” ACM SIGKDD Explorations Newsletter, vol. 2, no. 1, pp. 1-15, 2000.

H. P. Luhn, “The Automatic Creation of Literature Abstracts” IBM Journal of Research and Development, vol. 2, pp.159-165. 1958.

G. J. Rath, A. Resnick, and T. R. Savage, “The formation of abstracts by the selection of sentences” American Documentation, vol. 12, pp.139-143.1961.

Inderjeet Mani and Mark T. Maybury, editors, Advances in automatic text summarization MIT Press. 1999.

H. P. Edmundson., “New methods in automatic extracting” Journal of the Association for Computing Machinery 16 (2). pp.264- 285.1969.

R. O. Duda, P. H. Hart, and D. G. Stock, Pattern Classification. New York: Wiley, 2001.

U. von Luxburg, “A tutorial on spectral clustering,” Statist. Comput., vol. 17, no. 4, 2007.

L. Xu, J. Neufeld, B. Larson, and D. Schuurmans, “Maximum margin clustering,” in Proc. Adv. Neural Inf. Process. Syst., 2004, pp. 1537–1544.

K. Zhang, I.W. Tsang, and J. T.Kwok, “Maximum margin clusteringmade practical,” in Proc. 24th Int. Conf. Mach. Learning, 2007, pp. 1119–1126.

F.Hoppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy Cluster Analysis.New York: Wiley, 1999.

Amanda Rachel Hutton, B.S. “Using Sentence-Level Classification to Predict Sentiment at the Document-Level” May 2012.

Karypis, George, Vipin Kumar and Michael Steinbach. 2000. A Comparison of Document Clustering Techniques. KDD workshop on Text Mining.

J.Durga, D.Sunitha, S.P.Narasimha, B.Tejeswini Sunand “A Survey on Concept Based Mining Model using Various Clustering Techniques” International Journal of Advanced Research in Computer Science andSoftware Engineering 2012.

C.D. Manning, P. Raghavan, and H. Schu¨ tze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.

Y. Li, D. McLean, Z.A. Bandar, J.D. O’Shea, and K. Crockett, “Sentence Similarity Based on Semantic Nets and Corpus Statistics,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 8, pp. 1138-1150, Aug. 2006.

C. Fellbaum, Word Net: An Electronic Lexical Database. MIT Press, 1998.


Full Text: PDF[FULL TEXT]

Refbacks

  • There are currently no refbacks.


Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.