Vigorous Component Built Data Controlling System

Raghavendra Chunduri, Ayyappa Chakravarthi M


The Internet presents a huge amount of useful information which is usually formatted for its users, which makes it difficult to extract relevant data from various sources. Therefore, the availability of robust, flexible Information Extraction (IE) systems that transform the Web pages into program-friendly structures such as a relational database will become a great necessity .The motivation behind such systems lies in the emerging need for going beyond the concept of “human browsing.”The World Wide Web is today the main “all kind of information” repository and has been so far very successful in disseminating information to humans[5].

The Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that user’s access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from database servers.[3]

In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. And then we assign labels to each of this group.


Data alignment, data annotation, web database, wrapper generation


Annotating Search Results from Web Databases Yiyao Lu, Hai He, Hongkun Zhao, Weiyi Meng, Member, IEEE, and Clement Yu, Senior Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 3, MARCH 2013

W. Liu, X. Meng, and W. Meng, “ViDE: A Vision-Based Approach for Deep Web Data Extraction,” IEEE Trans. Knowledge and Data Eng., vol. 22, no. 3, pp. 447-460, Mar. 2010

Y. Lu, H. He, H. Zhao, W. Meng, and C. Yu, “Annotating Structured Data of the Deep Web,” Proc. IEEE 23rd Int’l Conf. Data Eng. (ICDE), 2007.

STAVIES: A System for Information Extraction from Unknown Web Data Sources through Automatic Web Wrapper Generation Using Clustering Techniques Nikolaos K. Papadakis, Dimitrios Skoutas, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 12, DECEMBER 2005

A Survey of Web Information Extraction Systems Chia-Hui Chang, Member, IEEE Computer Society, Mohammed Kayed, Moheb Ramzy Girgis, Member, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 10, OCTOBER 2006

Wang Computer Science Department University of Science and Technology Clear Water Bay, Kowloon Hong Kong Computer

Science Department University of Science and Technology Clear Water Bay, Kowloon Hong Kong

A. Arasu and H. Garcia-Molina, “Extracting Structured Data from Web Pages,” Proc. SIGMOD Int’l Conf. Management of Data, 2003.

L. Arlotta, V. Crescenzi, G. Mecca, and P. Merialdo, “Automatic

Annotation of Data Extracted from Large Web Sites,” Proc. Sixth Int’l Workshop the Web and Databases (WebDB), 2003.

P. Chan and S. Stolfo, “Experiments on Multistrategy Learning by

Meta-Learning,” Proc. Second Int’l Conf. Information and Knowledge Management (CIKM), 1993.

W. Bruce Croft, “Combining Approaches for Information Retrie- val,” Advances in Information Retrieval: Recent Research from the

Center for Intelligent Information Retrieval, Kluwer Academic, 2000.

V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRUNNER: Towards Automatic Data Extraction from Large Web Sites,” Proc. Very Large Data Bases (VLDB) Conf., 2001.

S. Dill et al., “SemTag and Seeker: Bootstrapping the Semantic Web via Automated Semantic Annotation,” Proc. 12th Int’l Conf. World Wide Web (WWW) Conf., 2003.

H. Elmeleegy, J. Madhavan, and A. Halevy, “Harvesting Relational

Tables from Lists on the Web,” Proc. Very Large Databases (VLDB) Conf., 2009.

D. Embley, D. Campbell, Y. Jiang, S. Liddle, D. Lonsdale, Y. Ng, and R. Smith, “Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages,” Data and Knowledge Eng., vol. 31, no. 3, pp. 227-251, 1999.

D. Freitag, “Multistrategy Learning for Information Extraction,” Proc. 15th Int’l Conf. Machine Learning (ICML), 1998.

D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley, 1989

Full Text: PDF [Full Text]


  • There are currently no refbacks.

Copyright © 2013, All rights reserved.|

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at