|
Related Communities:
|
|
|
|
Programming of the Entity Resolution and Data Fusion Methods while Implementing ETL in the Hadoop Environment.
Author(s): | A. Vovchenko, L. Kalinichenko, D. Kovalev. |
Created: | 2014/10/13 |
Published: | 16th Russian Conference on Digital Libraries RCDL 2014 Proceedings. CEUR Workshop Proceedings 1297:26-34. (In Russian) |
Abstract: | |
The paper is devoted to the problem of Entity
Resolution and Data Fusion implementation in the
context of big data integration. Entity resolution cares of
Duplicate Detection, Deduplication, Record Linkage,
Object Identification, Reference Matching, and other
ETL-related tasks. Data fusion is the final step in the
data integration process. This paper gives a short
overview of methods for entity resolution and data
fusion techniques. Then the paper presents the
techniques for programming of the entity resolution and
data fusion methods for implementing of the ETL
process in the Hadoop environment. |
Download: |
[ Adobe PDF ]
|
|
|