|
Related Communities:
|
|
|
|
Environment for Integration of Large Heterogeneous Data Collections.
Author(s): | Budzko V. I., Kalinichenko L. A., Stupnikov S. A., Vovchenko E. A., Briukhov D. O., Kovalev D. Y. |
Created: | 2014/10/01 |
Published: | Systems of High Availability. -- Moscow: Radiotechnika, 2014. -- Iss. 3. -- P. 3-19. (In Russian) |
Abstract: | |
An approach for the development of an environment for integration of heterogeneous data collections (structured, semi-structured and unstructured) is considered. The main idea of the approach is a combination of subject mediation technology, the Hadoop open source platform for distributed data storage and processing and a relational warehouse system over Hadoop. As a warehouse the systems like IBM Big SQL or Hive can be used. Issues of entity resolution and data fusion in the context of big data integration in the Hadoop are considered. Brief overview of methods for information extraction from text is provided. Techniques for programming of the entity resolution and data fusion methods using HIL high-level integration language are illustrated. An example of a problem to be solved using the proposed environment for integration of heterogeneous data collections is provided. |
Download: |
[ Adobe PDF ]
|
|
|