Related Communities:

Information Extraction from Large Collection of Russian Text Documents

Information Extraction from Large Collection of Russian Text Documents in Hadoop Environment.

Author(s): Dmitry O. Briukhov, Nikolay A. Skvortsov.
Created:2014/10/13
Published:16th Russian Conference on Digital Libraries RCDL 2014 Proceedings. -- Dubna:JINR, 2014. -- P. 391-398.
Abstract:
The paper describes issues of information extraction from large collections of the Russian text documents in Hadoop environment. The architecture for entity extraction and analysis tools basing on IBM technologies is introduced. The methods for supporting analysis of documents in Russian language are described.
Download: [ Adobe PDF ]

Supported by Synthesis Group