A Suit of Record Normalization Methods, From Naive Ones, Globally Mine a Group of Duplicate Records

Mummidi Siva Sankar, Nadella Sunil


The promise of Big Data pivots after tending to a few big data integration challenges, for example, record linkage at scale, continuous data combination, and incorporating Deep Web. Although much work has been directed on these issues, there is restricted work on making a uniform, standard record from a gathering of records comparing to a similar genuine element. We allude to this errand as record normalization. Such a record portrayal, instituted normalized record, is significant for both front-end and back-end applications. In this paper, we formalize the record normalization issue, present top to bottom examination of normalization granularity levels (e.g., record, field, and worth segment) and of normalization structures (e.g., common versus complete). We propose an exhaustive structure for registering the normalized record. The proposed system incorporates a suit of record normalization techniques, from guileless ones, which utilize just the data accumulated from records themselves, to complex methodologies, which all around mine a gathering of copy records before choosing an incentive for a quality of a normalized record.


