Related Communities:

Programming of the Entity Resolution and Data Fusion Methods

Programming of the Entity Resolution and Data Fusion Methods while Implementing ETL in the Hadoop Environment.

Author(s): A. Vovchenko, L. Kalinichenko, D. Kovalev.
Created:2014/10/13
Published:16th Russian Conference on Digital Libraries RCDL 2014 Proceedings. CEUR Workshop Proceedings 1297:26-34. (In Russian)
Abstract:
The paper is devoted to the problem of Entity Resolution and Data Fusion implementation in the context of big data integration. Entity resolution cares of Duplicate Detection, Deduplication, Record Linkage, Object Identification, Reference Matching, and other ETL-related tasks. Data fusion is the final step in the data integration process. This paper gives a short overview of methods for entity resolution and data fusion techniques. Then the paper presents the techniques for programming of the entity resolution and data fusion methods for implementing of the ETL process in the Hadoop environment.
Download: [ Adobe PDF ]

Supported by Synthesis Group