Imminent Rift Assortment Algorithm for Elevated Facet Data Using Wanton Collecting

Rambabu Matta, Ayyappa Chakravarthi M

Abstract


Feature subset clustering is a powerful technique to reduce the dimensionality of feature vectors for text classification. In this paper, we propose a similarity-based self-constructing algorithm for feature clustering with the help of K-Means strategy. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster, and make a head to each cluster data sets.

By the FAST algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our FAST algorithm implementation can run faster and obtain better-extracted features than other methods.

Keywords


Feature subset selection, filter method, feature clustering, graph-based clustering

References


Almuallim H. and Dietterich T.G., Algorithms for Identifying Relevant Features, In Proceedings of the 9th

Canadian Conference on AI, pp 3845,1992.

Bell D.A. and Wang, H., A formalism for relevance and its application infeature subset selection, Machine Learning, 41(2), pp 175195, 2000.

Biesiada J. and Duch W., Features election for

highdimensionaldatałaPearsonredund ancy based filter, Advances inSoftComputing, 45, pp

C249,2008.

Dash M., Liu H. and Motoda H., Consistency based feature Selection, InProceedings of the Fourth Pacific Asia Conference on Knowledge Discoveryand Data Mining, pp 98-109, 2000.

Das S., Filters, wrappers and a boostingbased hybrid for feature Selection,In Proceedings of the Eighteenth International Conference on MachineLearning, pp 74-81, 2001.

Dash M. and Liu H., Consistency-based search in feature selection. Artificial Intelligence, 151(1-2), pp 155-176, 2003.

Demsar J., Statistical comparison of classifiers over multiple data sets, J.Mach. Learn. Res., 7, pp 1-30, 2006.

Fleuret F., Fast binary feature selection with conditional mutual

Information,Journal of Machine Learning Research, 5, pp 15311555, 2004.

Forman G., An extensive empirical study of feature selection metrics fortext classification, Journal of Machine Learning Research, 3, pp 1289-1305,2003.

Garcia S and Herrera F., An extension on “Statistical Comparisons ofClassifiers over Multiple Data Sets” for all pairwise comparisons, J. Mach.Learn. Res., 9, pp 2677-2694, 2008.

Guyon I. and Elisseeff A., An introduction to variable and feature selection,Journal of Machine Learning Research, 3, pp 11571182, 2003.

Hall M.A., Correlation-Based Feature

Selection for Discrete and

NumericClass Machine Learning, In Proceedings of 17th International Conferenceon Machine

Learning, pp 359-366, 2000.


Full Text: PDF [Full Text]

Refbacks

  • There are currently no refbacks.


Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.