A Progressive Technique for Duplicate Detection Evaluating Multiple Data Using Genetic Algorithm with Real World Objects

K. Sai Mallikarjuna, N N. Sushma

Abstract


Here in this paper we discuss about an analysis on progressive duplicate record detection in real world data have at least two redundancy in database. Duplicate detection is strategy for recognizing all instances of various delineation of some genuine items, case client relationship administration or data mining. An agent case client relationship administration, where an organization loses cash by sending different inventories to a similar individual that would bring down consumer loyalty. Another application is Data Mining i.e to rectify input data is important to build valuable reports that from the premise of components. In this paper to learn about the progressive duplication calculation with the assistance of guide lessen to recognize the duplicates data and erase those duplicate records.


Keywords


- Data Duplicity Detection, Progressive de duplication, PSNM, Data Mining, Data Cleaning.

References


Thorsten Papenbrock, ArvidHeise, and Felix Naumann,‟ Progressive Duplicate Detection‟ IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2014.

S. Yan, D. Lee, M. yen Kan, and C. L. Giles, “Adaptive sorted neighborhood methods for efficient record linkage,” in International Conference on Digital Libraries (ICDL), 2007.

M. A. Hernández and S. J. Stolfo, “Real-world data is dirty: Data cleansing and the merge/purge problem,” Data Mining and Knowledge Discovery, vol. 2, no. 1, 1998.

X.Dong, A.Halevy, and J.Madhavan, “Reference reconciliation in complexinformation spaces,” in Proceedings of the International Conference on Management of Data (SIGMOD), 2005.

S.E.Whang, D.Marmaros, and H.Garcia-Molina, “Pay-as-yougo entity resolution” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2012.

A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicat record detection: A survey,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 19, no. 1, 2007.

U.Draisbach, F.Naumann, S.Szott, and O. Wonneberg, “Adaptive windows for duplicate detection,” in Proceedings of the International Conference on Data Engineering (ICDE), 2012.

U.Draisbach and F. Naumann, “A generalization of blocking and windowing algorithms for duplicate detection.” in International Conference on Data and Knowledge Engineering (ICDKE), 2011.

L. Kolb, A. Thor, and E. Rahm, “Parallel sorted neighbourhoodblockingwithmapreduce,” in Proceedings of the Conference Datenbank system in Büro, Technik und Wissenschaft(BTW

U. Draisbach and F. Naumann, “A generalization of blocking and windowing algorithms for duplicate detection,” in Proc. Int. Conf. Data Knowl. Eng., 2011, pp. 18–24.


Full Text: PDF [Full Text]

Refbacks

  • There are currently no refbacks.


Copyright © 2013, All rights reserved.| ijseat.com

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.