A Clustering-Based Shifting Technique To Align Data Units In Web Data Bases

lalitha kumari kolli, K.V.T subba rao

Abstract


An increasing number of databases have become web available through HTML form based search interfaces. The data units return from the underlying database are repeatedly encoded into the result pages dynamically for human browsing. In this paper we present an automatic annotation approach that first line up the data units on a result page into different groups such that the data in the same group have the same semantic. Then for each group we annotate it from unlike aspects and increasing the different annotations to expect a final annotation label for it. Each basic annotator is used to create a label for the units within their group holistically and a probability model is accepted to conclude the most suitable label for each group. The rules for all aligned groups jointly form the annotation wrapper for the corresponding WDB which can be used to directly annotate the data recovered from the same WDB in reply to new queries without the need to execute the alignment and annotation phases again. As such annotation wrappers can execute annotation rapidly which is important for online applications.


Keywords


Data alignment, data annotation, web database, wrapper generation

References


A. Arasu and H. Garcia-Molina, “Extracting Structured Data from Web Pages,” Proc. SIGMOD Int’l Conf. Management of Data, 2003.

L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, “Automatic Annotation of Data Extracted from Large Web Sites,” Proc. Sixth Int’l Workshop the Web and Databases (WebDB), 2003.

P. Chan and S. Stolfo, “Experiments on Multistrategy Learning by Meta-Learning,” Proc. Second Int’l Conf. Information and Knowledge Management (CIKM), 1993.

W. Bruce Croft, “Combining Approaches for Information Retrieval,” Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Kluwer Academic, 2000.

V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites,” Proc. Very Large Data Bases (VLDB) Conf., 2001.

S. Dill et al., “SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation,” Proc. 12th Int’l Conf. World Wide Web (WWW) Conf., 2003.

H. Elmeleegy, J. Madhavan, and A. Halevy, “Harvesting Relational Tables from Lists on the Web,” Proc. Very Large Databases (VLDB) Conf., 2009.

D. Embley, D. Campbell, Y. Jiang, S. Liddle, D. Lonsdale, Y. Ng, and R. Smith, “Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages,” Data and Knowledge Eng., vol. 31, no. 3, pp. 227-251, 1999.

D. Freitag, “Multistrategy Learning for Information Extraction,” Proc. 15th Int’l Conf. Machine Learning (ICML), 1998.

D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley, 1989. 526 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013 TABLE 5 Performance Using Local Interface Schema

S. Handschuh, S. Staab, and R. Volz, “On Deep Annotation,” Proc. 12th Int’l Conf. World Wide Web (WWW), 2003.

S. Handschuh and S. Staab, “Authoring and Annotation of Web Pages in CREAM,” Proc. 11th Int’l Conf. World Wide Web (WWW), 2003.

B. He and K. Chang, “Statistical Schema Matching Across Web Query Interfaces,” Proc. SIGMOD Int’l Conf. Management of Data, 2003.

H. He, W. Meng, C. Yu, and Z. Wu, “Automatic Integration of Web Search Interfaces with WISE-Integrator,” VLDB J., vol. 13, no. 3, pp. 256-273, Sept. 2004.

H. He, W. Meng, C. Yu, and Z. Wu, “Constructing Interface Schemas for Search Interfaces of Web Databases,” Proc. Web Information Systems Eng. (WISE) Conf., 2005.


Full Text: PDF[FULL TEXT]

Refbacks

  • There are currently no refbacks.


Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.