Unique value disintegration for probing results using clustering algorithm

Sushuma Narkedamilli, B S N Murthy

Abstract


This paper describes gene expression analysis by Unique Value Disintegration(UVD), emphasizing initial characterization of the data. We describe UVD methodsfor visualization of gene expression data, representation of the data using a smallernumber of variables, and detection of patterns in noisy gene expression data. Inaddition, we describe the precise relation between UVD analysis and PrincipalComponent Analysis (PCA) when PCA is calculated using the covariance matrix,enabling our descriptions to apply equally well to either method. Our aim is toprovide definitions, interpretations, examples, and references that will serve asresources for understanding and extending the application of UVD and PCA to geneexpression analysis.Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using suffix arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm.


Keywords


UVD, PCA, Clustering

References


Michael W. Berry, Susan T. Dumais, and Gavin W. O’Brien. Using linear algebra for intelligent information retrieval. Technical Report UT-CS-94-270, University of Tennessee, 1994.

Zhang Dong. Towards Web Information Clustering. PhD thesis, Southeast University, Nanjing, China, 2002.

Peter Hannappel, Reinhold Klapsing, and Gustaf Neumann. MSEEC — a multi search engine with multiple clustering. In Proceedings of the 99 Information Resources Management Association Conference, May 1999.

Marti A. Hearst and Jan O. Pedersen. Re examining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of SIGIR-96, 19th ACM International Conference on Research and Development in Information Retrieval, pages 76–84, Zu¨rich, CH, 1996.

UdiManber and Gene Myers. Suffix arrays: a new method for on-line string searches. In Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pages 319–327, 1990.

Irmina Mas lowska. Phrase-Based Hierarchical Clustering of Web Search Results. In Proceedings of the 25th European Conference on IR Research, ECIR 2003, volume 2633 of Lecture Notes in Computer Science, pages 555–562, Pisa, Italy, 2003. Springer.

Stanis law Osin´ski. An Algorithm for Clustering of Web Search Results. Master’s thesis, Poznan´ University of Technology, Poland, 2003. [@:] http:

//www.cs.put.poznan.pl/dweiss/carrot-bin/osinski-2003-lingo.pdf.

Stanis law Osin´ski and Dawid Weiss. Conceptual clustering using Lingo algorithm: Evaluation on Open Directory Project data. Submitted to Intelligent Information Systems Conference 2004, Zakopane, Poland, 2003.

Gerard Salton. Automatic Text Processing — The Transformation, Analysis, and Retrieval of Information by Computer. Addison–Wesley, 1989.

Dawid Weiss and Jerzy Stefanowski. Web search results clustering in Polish: Experimental evaluation of Carrot. In Proceedings of the New Trends in Intelligent Information Processing and Web Mining Conference, Zakopane, Poland, 2003.

Oren Zamir and Oren Etzioni. Grouper: a dynamic clustering interface to Web search results. Computer Networks (Amsterdam, Netherlands: 1999), 31(11 – 16):1361–1374, 1999.


Full Text: PDF[FULL TEXT]

Refbacks

  • There are currently no refbacks.


Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.