RFBR Grant 01-07-90084
Methods and tools for development of subject mediators of heterogeneous information collections for distributed digital libraries (BIOMED)
Project starting date
Victor Zakharov, Institute for Problems of Informatics RAS, Moscow, Russia
- Institute of Informatics Problems RAS
- Institute of Cytology and Genetics Siberian Branch of RAS
- Institute of Molecular Biology RAS
Objectives and results
The project is intended for investigation and development of methods and tools providing for semantic integration of heterogeneous, independently developed information collections and design of the respective information integration systems formed by subject mediators-based middleware supporting interactions of heterogeneous information collections and information customers. Using these methods, a subject mediator in the biomolecular domain is planned to be designed. Mediator life cycle includes a consolidation phase, during which a model of the given subject domain is being formed as well as the mediator's metainformation of the mediator level corresponding to this model is created. At an operational phase of the mediator any data and knowledge sources can be registered in the mediators in terms of the mediator level.
The project focuses on the following:
- development of the mediator metainformation using heterogeneous biomolecular information sources;
- support of the process of information sources registration at the mediator (development of methods and tools providing for contextualization of information sources at the mediator);
- development of methods supporting automatization of heterogeneous sources wrappers development;
- development of methods of personalization of the mediator's information for particular categories of its customers.
A model of the subject domain for representation of information in the area of molecular biology has been developed. In the model a natural information hierarchy of DNA, RNA, protein, gene net models) has been taken into account. Data semantics were emphasized during the model development. In particular, an ontology for the gene expression regulation has been developed. The subject domain model is defined as the mediator specification and includes definition of concepts, thesauri and vocabularies. Representative collections of data selected and specified for the first phase of integration in the mediator include TRRD, EPD, EMBL data bases. The canonical mediator model has been refined and the facilities for support of the mediator's repository have been developed. The canonical model provides for mapping into it of various data models (structured, semi-structured, behavioral) so that they could serve as refinements of canonical models (schemas). For storage in the mediator of the metainformation expressed by means of the canonical model the facilities for support of specific metainformation repository have been developed. The repository is implemented as the Oracle 9 database. Its structure corresponds to the canonical model and is sufficient to represent the mediator specifications. An approach for registration of information collections in the subject mediator has been developed based on the Local as View technique. It provides for the ontological-based reconciliation of the application contexts of the registered sources and that of the mediator, identification of classes of the mediator schema relevant to collection classes, identification of common fragments of instance types of the mediator classes and relevant source classes, constructing of views, specifying source classes as views over the mediator virtual classes. In frame of the project the following prototypes for supporting the registration of information collections in the subject mediator have been developed.