Related Communities:

Information Extraction from Multistructured Data

Information Extraction from Multistructured Data and its Transformation into a Target Schema.

Author(s): Dmitry Briukhov, Sergey Stupnikov, Leonid Kalinichenko, Alexey Vovchenko.
Published:Selected Papers of the XVII International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2015). CEUR Workshop Proceedings 1536:81-90. (In Russian)
According to the 4th paradigm formulated by Jim Gray in 2007, data are now one of the main driving force for progressing of science. Such data are obtained in the result of observations carried out by high-tech instruments or accumulated in the process of human activity in economy, industry, social environment, etc. Actually, the scientific knowledge is generated in process of the data intensive analysis resulted in knowledge extraction from these data. Fast growth of the data volume and diversity in various data intensive domains causes development of new methods and facilities for analysis and management of massive multistructured data. In this paper the experience is summed up that has been accumulated in the process of exploring of methods, infrastructures and programming facilities intended for extraction and integration of information out of multistructured data. The extracted information should correspond to the needs of the specific problems. Such needs are defined by the target structured schema.
Download: [ Adobe PDF ]

Supported by Synthesis Group