Executive Summary
This deliverable describes a set of three models, which in conjunction with the semantic mediator (WT7.1) enables the execution of queries formulated through the eligibility representation of the Clinical research information model (WT6.4). An ontology-driven mechanism was developed to enable linkage and integration of phenotypic and genotypic data from multiple distributed data sources. It makes use of the Clinical Data Integration Model (CDIM, WT6.5), the Data Source Model (DSM, WT6.6) and the CDIM-DSM mapping model (WT6.6). Queries formulated through the CDIM and vocabulary service (WT7.2) are translated to local queries by the mediator using the individual source instances of the DSM and CDIM-DSM models.
CDIM is a global mediation model expressed as an ontology for use in the primary care domain. It uses a realist approach employing Basic Formal Ontology (BFO v1.1) as an upper ontology. Other ontologies were imported or specialized to give deeper definition to the concepts in the domain. These included OGMS, IAO and VSO. The CDIM ontology includes concepts that are especially important to primary care (e.g. episode of care or reason for encounter), but also others to handle temporality in queries (e.g. the start and beginning of processes).
Continue readng executive summary
The specific requirements for primary care data were first gathered through discussions with experts in the field in order to get a broad view of the domain. The resulting ontology was fine-tuned using two TRANSFoRm use cases covering RCTs and epidemiology (WT1.1, D1.1). Existing models from other projects were also investigated for their usefulness, but these did not satisfy TRANSFoRm’s needs in regard to approaches to interoperability (unifying goal), modelling (ontology), domain of interest and content (primary care research).
Following on from a detailed survey and characterisation in year-1 of EHRs and data repositories (WT6.1, WT6.2), three data sources were selected for further investigation for their modes of access (such as SQL and HL7 messaging), their data model and content to better understand the issue and degree of heterogeneity. These sources included EHR and routine clinical data and research genetic data. From this understanding a data source model (DSM) was developed covering structural aspects of data representation and at all levels of granularity. The model supports dynamic structure, which is a common feature in databases of clinical data (where the structure of data held in a given element is dependent on the content of another element).
Using the DSM, the two clinical repositories and one genetic repository were instantiated and a CDIM-DSM mapping model was developed to align the database structures to the concepts of the ontology. It is this mapping model that identifies the semantics in the source data structures. For the chosen data sources and use-cases a preliminary instantiation of the mapping models proved to be straightforward.
Guidance was developed to assist those charged with writing the TRANSFoRm semantic mediator and to illustrate by specific examples from the use-cases how the three models and terminologies are used to translate eligibility queries to local database queries for execution.
Report Details
Principal Authors: Ethier JF, McGilchrist MM
Contributing Authors: Burgun A, Sullivan FS
Partner Institutions: University of Dundee; INSERM/UR1

