A Study To Support First-N Queries And Incremental Updates To Answer Multi Keyword Queries

Ala Rajitha, Fasi Ahmed Parvez

Abstract


Most search engines and online search forms maintain auto completion which demonstrates suggested queries or even answers on the fly as a user types in a keyword query character by character. As many search systems accumulate their information in a backend relational DBMS. Some databases such as Oracle and SQL server already support prefix search and we could use this feature to do search-as-you-type. Still not all databases provide this feature. For this reason we study new methods that can be used in all databases. One approach is to expand a separate application layer on the database to construct indexes and execute algorithms for answering queries. While this approach has the benefit of achieving a high performance its main drawback is duplicating data and indexes resulting in additional hardware costs. Another approach is to use database extenders such as DB2 Extenders, Informix Data Blades, Microsoft SQL Server Common Language Runtime (CLR) integration and Oracle Cartridges which permit developers to implement new functionalities to a DBMS. This approach is not possible for databases that do not provide such an extender interface such as MySQL. Because it needs to utilize proprietary interfaces provided by database vendors a solution for one database may not be portable to others. In addition an extender-based solution particularly those implemented in C/C++ could cause severe dependability and security problems to database engines.


Keywords


Search-as-you-type, databases, SQL, fuzzy search.

References


S. Agrawal, K. Chakrabarti, S. Chaudhuri, and V. Ganti, “Scalable Ad-Hoc Entity Extraction from Text Collections,” Proc. VLDB Endowment, vol. 1, no. 1, pp. 945-957, 2008.

S. Agrawal, S. Chaudhuri, and G. Das, “DBXplorer: A System for Keyword-Based Search over Relational Data Bases,” Proc. 18th Int’l Conf. Data Eng. (ICDE ’02), pp. 5-16, 2002.

A. Arasu, V. Ganti, and R. Kaushik, “Efficient Exact Set-Similarity Joins,” Proc. 32nd Int’l Conf. Very Large Data Bases (VLDB ’06), pp. 918-929, 2006.

H. Bast, A. Chitea, F.M. Suchanek, and I. Weber, “ESTER: Efficient Search on Text, Entities, and Relations,” Proc. 30th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ’07), pp. 671-678, 2007.

H. Bast and I. Weber, “Type Less, Find More: Fast Autocompletion Search with a Succinct Index,” Proc. 29th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval

(SIGIR ’06), pp. 364-371, 2006.

H. Bast and I. Weber, “The Complete Search Engine: Interactive, Efficient, and Towards IR & DB Integration,” Proc. Conf. Innovative Data Systems Research (CIDR), pp. 88-95, 2007.

R.J. Bayardo, Y. Ma, and R. Srikant, “Scaling up all Pairs Similarity Search,” Proc. 16th Int’l Conf. World Wide Web (WWW ’07), pp. 131- 140, 2007.

G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan, “Keyword Searching and Browsing in Data Bases Using Banks,” Proc. 18th Int’l Conf. Data Eng. (ICDE ’02), pp. 431- 440, 2002.

K. Chakrabarti, S. Chaudhuri, V. Ganti, and D. Xin, “An Efficient Filter for Approximate Membership Checking,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’08), pp. 805- 818, 2008.

S. Chaudhuri, K. Ganjam, V. Ganti, R. Kapoor, V. Narasayya, and T. Vassilakis, “Data Cleaning in Microsoft SQL Server 2005,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’05), pp. 918-920, 2005.

S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust and Efficient Fuzzy Match for Online Data Cleaning,” Proc. ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’03), pp. 313- 324, 2003.

S. Chaudhuri, V. Ganti, and R. Kaushik, “A Primitive Operator for Similarity Joins in Data Cleaning,” Proc. 22nd Int’l Conf. Data Eng. (ICDE ’06), pp. 5-16, 2006.

S. Chaudhuri, V. Ganti, and R. Motwani, “Robust Identification of Fuzzy Duplicates,” Proc. 21st Int’l Conf. Data Eng. (ICDE), pp. 865- 876, 2005.

S. Chaudhuri and R. Kaushik, “Extending Autocompletion to Tolerate Errors,” Proc. 35th ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’09), pp. 433-439, 2009.

B.B. Dalvi, M. Kshirsagar, and S. Sudarshan, “Keyword Search on External Memory Data Graphs,” Proc. VLDB Endowment, vol. 1, no. 1, pp. 1189-1204, 2008.

B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin, “Finding Top-K Min-Cost Connected Trees in Data Bases,” Proc. IEEE 23rd Int’l Conf. Data Eng. (ICDE ’07), pp. 836-845, 2007.

L. Gravano, P.G. Ipeirotis, H.V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, “Approximate String Joins in a Data Base (Almost) for Free,” Proc. 27th Int’l Conf. Very Large Data Bases (VLDB ’01), pp. 491-500, 2001.

M. Hadjieleftheriou, A. Chandel, N. Koudas, and D. Srivastava, “Fast Indexes and Algorithms for Set Similarity Selection Queries,” Proc. IEEE 24th Int’l Conf. Data Eng. (ICDE ’08), pp. 267-276, 2008.


Full Text: PDF[FULL TEXT]

Refbacks

  • There are currently no refbacks.


Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.