De-duplication in Multiple Database

Madhuri Mane, Dr. V. R. Ghorpade, Vikas Mane


Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state of the art record matching methods are supervised, which requires the user to provide training data. The detection of similar duplicate records is a difficult task, especially when the records are domain independent. Unfortunately, domain knowledge is not always available. Moreover, domain-specific methodologies apply only for some particular domains, and the rules developed for one domain often do not hold for different domains. Nevertheless, currently so many different domains exist that the need for domain-independent research is undeniable. Although dealing with domain-independent techniques for similar duplicate record detection in database has received attention over the last few years.

