An extensible and unifying approach to retrospective clinical data modeling: the BrainTeaser Ontology

BTO has been designed exploiting a co-design approach, strictly collaborating with the medical partners and domain experts, to embed their knowledge in BTO and, at the same time, to validate all the design choices. To this end, we operated iteratively, producing several (intermediate) versions of the ontology and discussing them with our domain experts. We exploited the iterative discussion process with the medical partners to ensure that these newly defined concepts correctly described the corresponding real-world concepts and to guarantee the semantic quality of the ontology. BTO models the clinical course and the anamnestic history of patients affected by ALS and MS by exploiting an event-based approach. With “event” we refer to anything that can happen to the patient during their clinical history. For example, at a certain point, the patient will experience an onset: we consider the onset as an event, assign it additional information (e.g., the date, the onset region), and link it to the patient. The subsequent diagnosis, visits, treatments and so on, will be considered events alike. Therefore, each of them will be characterized with a series of additional information and linked to the patient as well. This method provides a unified model instead of using different resources for each disease and it enhances ontology re-use as it is easier to extend BTO to represent other events or other diseases, not needed, or even unknown, at the time of the definition of the ontology.

Domain requirementsIdentification of the requirements

To identify the domain requirements and embed in the ontology the experts’ knowledge, we followed a co-design approach. The first phase involved discussing separately with each medical research team from the research centres involved in the BRAINTEASER project. More in detail, the medical research teams are from the hospital of Turin, Italy, and the University of Lisbon, Portugal for ALS and the hospital of Turin, and the IRCCS Foundation Mondino in Pavia, Italy, for MS. In this first phase, we identified the main domain requirements expressed as natural language sentences. The subsequent phase involved aligning the domain requirement of the different research teams by adopting a uniform terminology, identifying common physical-world entities within the natural language descriptions of the domain, and relations between them. The second step involved the usage of actual data provided by the research centres. This allowed us to determine the domain of the various classes, identify shared elements by all research centres, and prepare a first draft version of the BTO. This draft was then validated by the experts in two separate meetings, one specifically focused on ALS and one on MS. Based on the clinician and medical experts’ feedback, we updated the ontology, adding or removing classes when needed. The final step involved the feedback received through the reviews on progressive technical reports – about the development of the ontology – shared with the various medical teams. Upon reaching a consensus on the domain requirements across all research teams involved in the project, we finalized the definition of the domain requirements, which is reported below.

Definition of the domain requirements

As aforementioned, BTO is not designed to encode the semantic knowledge on a specific class of diseases under the form of a thesaurus, but rather it is thought as a means to allow interoperability of the data by encoding it in a KB using an ontology. This allows for different medical and research institutes to collect the data using the same semantics. The core of BTO can be instantiated to encode data from almost any clinical scenario. Nevertheless, it is common for diverse diseases to require different tests, types of interventions, and procedures. To showcase the capabilities of BTO, we instantiate it with the two diseases studied within the BRAINTEASER project, ALS and MS. A practitioner interested in extending BTO to a different disease can adopt an analogous methodology to the one described in the remainder of this manuscript. In a sense, our joint modelling of ALS and MS can be considered as a validation and a showcase of the flexibility and extensibility of BTO.

BTO design is centred on patients and events that can occur during each patient’s clinical history. The patient’s clinical history consists of several events, e.g., occurred traumas, pregnancies, surgical procedures, or treatments. Patient’s clinical course differs among those affected by MS and ALS however, the event-based approach exploited in BTO enables the joint model of the two diseases. Patients’ data requirements are the same for MS and ALS. Therefore, part of BTO is designed to model static variables, e.g., date of birth, biological gender, occupation, and clinical family history. Additionally, several works demonstrate the presence of genetic risk factors for both diseases [38, 39]. Hence, modelling patients’ genomes can enhance the understanding of risk factors for MS and ALS. In addition, pollutant exposure levels, smoking habits, or physical activity can influence the development or progress of both diseases [40, 41].

We provide in the remainder of this subsection an overview of the domain requirements, which revolves around clinical data collection for ALS and MS. A practitioner interested in more specific biochemical details, such as the etiology of the diseases, or biological pathways, can extend BTO, either using a biologic-oriented ontology or with their classes.

Multiple sclerosis

MS is an autoimmune disorder mainly affecting young adults characterized by the destruction of myelin in the Central Nervous System (CNS) [42, 43]. Pathologic findings include multiple sharply demarcated areas of demyelination throughout the white matter of the CNS. In terms of clinical manifestations, visual loss, paresthesias, spasticity, loss of sensation, and bladder dysfunction are recurring symptoms [42, 43]. The MS typical pattern consists of recurring attacks, known as relapses, followed by partial recovery. However, acute and chronic progressive forms also occur. More than 2.5 million people currently live with MS worldwide [44]. Given the incidence and impact that ALS and MS have on people’s lives, it is fundamental to devise tools to help clinicians diagnose and treat such diseases.

MS diagnosis is made through a combination of clinical history, neurological examination, and Magnetic Resonance Imagings (MRIs) [45]. In particular, the clinical history of patients affected by MS comprises:

Cerebrospinal Fluid (CSF) analysis [46];

The recording of Evoked Potentialss (EPs);

Clinical Evaluation (e.g., weight and Body Mass Index (BMI) assessments);

EDSS score [47];

Hematology Tests.

In addition, MS can manifest itself in different phases, each involving different courses of treatment:

Clinically Isolated Syndrome (CIS);

Radiologically Isolated Syndrome (RIS);

Primary Progressive MS (PP);

Secondary Progressive MS (SP);

Relapsing-Remitting MS (RR).

MS is often characterized by a cyclic progression, with periods of worsening of the disease, called relapses and improvements. It is, therefore, of uttermost importance to record symptoms and body areas (sites) involved during relapses. MS relapses are also linked to pregnancies, with a decreased risk of relapses in correspondence with pregnancies, making them an additional important piece of information to be recorded. MS progression is recorded using the EDSS score, which is usually assessed by clinicians during visits. Being able to predict the future EDSS score for each patient can enhance precision medicine. Thus, we record all visits where EDSS is assessed within BTO, to aid the development of predictive models to foresee when the patient will present a worsening condition.

Amyotrophic lateral sclerosis

ALS is a heterogeneous neurodegenerative disease associated with motor dysfunction, such as muscle weakness or dysphagia, and cognitive and behavioural changes [48]. ALS affects upper and lower motor neurons in the brain stem and spinal cord [42, 49]. The disease onset usually occurs after age fifty and becomes fatal within three to six years. Clinical manifestations include, among others, progressive weakness, atrophy, hyperreflexia, and the eventual paralysis of respiratory functions. Pathologic features include the replacement of motor neurons with fibrous astrocytes and the atrophy of anterior spinal nerve roots as well as corticospinal [42, 49]. Global estimates indicate that the incidence of ALS ranges between 4.1 and 8.4 per 100,000 persons [50].

The clinical history of patients affected by ALS comprises:

Anatomical region of the onset (e.g., bulbar or spinal);

Presence of behavioural or cognitive impairments;

Pulmonary function tests (e.g., Relative Forced Vital Capacity (FVC) measures);

Lower vs upper motor neuron predominant phenotype;

ALSFRS-R rating scale [51];

Milano-Torino functional staging system (MiToS) functional staging system [52];

King’s clinical staging method (KINGS) [53].

ALS is characterized by very fast progression requiring a number of medical interventions, with a positive impact on the quality of life of the patients, and prolonging survival, such as the Non-Invasive mechanical Ventilation (NIV) and Percutaneous Endoscopic Gastrostomy (PEG). Being able to predict when a patient will need one of such interventions would allow for preventing medical complications. Thus, we record the occurrence of such events within BTO, to aid the development of predictive models to foresee when the patient will need specific medical interventions.

Design principles

In the following, we describe how BTO complies with the Open Biological and Biomedical Ontology Foundry (OBO)Footnote 14 and FAIR principles [54]Footnote 15, favoring its adoption in heterogeneous scenarios.

The ontology is open and publicly available. Its definition and description can be found at http://brainteaser.dei.unipd.it/ontology/.

The ontology schema is defined according to the OWL 1.2 Common Format.

The proposed ontology relies on a unique URI/Identifier Space identified by the prefix https://w3id.org/brainteaser/ontology/schema/.

A description of the Versioning procedure, as well as previous versions of BTO, is available as part of the documentation of BTO on the ontology web page.

The Scope of BTO is clearly defined: the ontology is meant to model the anamnestic and clinical history of patients affected by two neurological diseases, ALS and MS.

Following the OBO principles, we associate Textual Definitions to each ontology class, also to favor its re-use in other scenarios.

Before defining a new relation, Relations available on the Relations Ontology (RO) have been considered. None of BTO relations presents the same meaning and could have been replaced with one of the RO – nevertheless, this possibility has always been scrutinized.

A detailed Documentation of the ontology is available on its web page.

For what concerns Documented Plurality of Users and Commitment To Collaboration, these aspects are intrinsic in developing and using BTO ontology. Indeed, BTO has been developed in the context of the BRAINTEASER Project, which includes partners from multiple European countries. The co-design approach used to devise BTO defines its collaborative nature.

BTO identifies its Locus of Authority into its developers, who are indicated on the web page of the ontology, and in the authors of this paper, that comprises both medical experts in ALS and MS and computer science specialists.

BTO follows strict Naming Conventions described in “Implementation principles” section.

Finally, the BRAINTEASER consortium is actively working on the Maintenance and update of BTO.

Validation

BTO has been validated with several online tools to verify its consistency and syntactical validity. The “OOPS! Ontology Pitfall Scanner”Footnote 16 [55] was utilized to confirm the accuracy of this ontology. Furthermore, we validated the ontology using the following tools: the SSN Validation ToolFootnote 17 [56], W3C Resource Description Framework (RDF) Validation ServiceFootnote 18, and Graphite RDF Triple-CheckerFootnote 19. None of the validation tools reported major problems directly linked to BTO. As further evidence of its validity, BTO has been checked from and pushed online on the public repository “Archivio”Footnote 20 [57] where it has been awarded with four stars (the maximum)Footnote 21 for its quality.

Implementation principles

To provide consistency in BTO some basic principles are adopted when defining classes and properties. These guidelines involve external referencing, annotation properties, and naming conventions.

External referencing

Reusing and Referencing external classes is common practice when developing ontologies [58]. Indeed, reusing entities and properties already defined in other resources enforces collaboration and data consistency. External referencing is managed with annotation properties and using the URI of the term in the original thesaurus. Due to its wide adoption and exhaustiveness, our primary choice as the external resource is NCIT [25], but others are also employed when no information is available in NCIT, e.g., Systematised NOmenclature of MEDicine Clinical Terms (SNOMED-CT) or ATC. The choice of NCIT as a main reference resource stems from its widespread adoption [59,60,61,62], granting increased interoperability to BTO. If the practitioner is more versed on a different reference resource the mapping between BTO classes and the corresponding classes of other well-known ontologies can be done automatically, as shown for example on the BioPortal page of the ontologyFootnote 22. This makes BTO substantially agnostic from the chosen reference ontology.

In particular, external URIs are used when defining named individuals that refer to abstract concepts. On the contrary, when a new class is inserted in BTO, it is defined within the BTO namespace, and connected references are expressed using annotation properties.

Namespaces

BTO’s URIs are divided into two namespaces: the schema namespace https://w3id.org/brainteaser/ontology/schema/ and the resource namespace https://w3id.org/brainteaser/ontology/resource/. All URIs corresponding to classes, data properties, and object properties belong to the former namespace, while the latter includes all URIs referring to real-world instances of the entities described in BTO at an ontological level. Notice that, in this sense, the resource namespace is empty until the clinician starts populating it with real-world data. The only instances included in the schema namespace are the named individuals corresponding to Simple Knowledge Organization System (SKOS) concepts (as defined in “Usage of the Simple Knowledge Organization System (SKOS)” section). The choice of including these elements in the schema namespace stems from the fact that akin to relational modelling controlled dictionaries, these entities do not depend on the data underneath but can be seen as a predefined thesaurus of concepts and are a fundamental part of the reality modelled in BTO.

Classes definition and annotation properties

All components of BTO have additional information in the form of annotation properties. We defined a list of essential metadata to add when a new class is introduced. Firstly, all classes must have a label denoting the name and a comment, which provides a brief explanation – together with its source (e.g., other thesauri, websites, or textbooks). If the class has an equivalent in NCIT, the name and definition are inherited from the thesaurus. In this case, the class comprises another annotation property called rdfs:isDefinedby expressing the Internationalized Resource Identifier (IRI) corresponding to the NCIT term of reference. Most biomedical vocabularies are mapped in the UMLSFootnote 23 with a unique identifier called CUI [34]. For each class that has a UMLS reference, the annotation property dcterms:conformsTo is instantiated with the URL of the corresponding concept. For the sake of clarity, Table 5 reports all the required annotation properties and their values for the example class bto:Pregnancy.

Naming conventions

All components must have a label and a comment. About object properties, BTO uses explanatory labels where the property range is included. In this case, the comment explains the relationship between the two classes involved. Table 6 reports an example of all required information for object properties. Concerning data properties, the label usually includes the name of the domain class so that its meaning is intuitive. A comment with the attribute description and, when available, the definition source are also included. Table 7 reports an example of the required information for data properties, using the property bto:deathCause as an example. Note that, all BTO components can comprise the note annotation property for additional remarks or business logic rules.

Usage of the Simple Knowledge Organization System (SKOS)

In BTO, external resources have been employed to model the diseases affecting a patient, anatomical sites of traumas, and pharmacological substances. Often we are interested in the abstract concept behind the medical term. When an ontology imports external resources, a modelling pattern is Classism [63]. Classism is a design pattern where an external hierarchy is modelled as a hierarchy of ontological classes. In this way, data is stored by instantiating multiple named individuals – all with different URIs – for each class, one for each piece of information of interest. In BTO we avoid classism. Avoiding this approach has two important advantages: i) it dramatically reduces the number of required URIs, by not defining multiple named individuals; ii) it reduces the complexity of the queries. For instance, if we employ classism to model the anatomical location of patients’ traumas, the query that returns patients who suffered from a head trauma needs to match three triples: one for patients suffering a trauma, one for traumas located in an anatomical location, and one for keeping only anatomical locations of type “Head”. On the other hand, if we avoid classism by defining a unique concept for each anatomical location as a named individual, the above query needs to match only two triples: one for patients suffering a trauma and all traumas located in the head (modelled as the same named individual for all head traumas). Therefore, in BTO, classification schemes that refer to abstract concepts already defined in other semantic resources, are modelled using the SKOS data modelFootnote 24. In detail, concepts of hierarchical schemes are modelled as named individuals of type skos:Concept and the relationships among concepts are represented by the object property skos:broaderTransitive. Such property is transitive and asserts that one concept is broader in meaning, i.e. more general, than another. Differently from the rdfs:subClassOf property, skos:broaderTransitive links two named individuals rather than two classes.

Table 5 List of required annotation properties. For each class in the Brainteaser Ontology, we define label, comment, isDefinedBy and conformsTo. The table reports the values for the example class “Pregnancy” Table 6 Example of required information for the bto:hasDisease object property. For each object property in BTO, we define a label, a comment describing it, the domain, and the range Table 7 Example of required information for the bto:deathCause data property. For each data property in BTO, we define the domain,label, and comment the domain, and rangeFig. 1figure 1

An example of the SKOS data model. Each medical term is modelled as a named individual and the hierarchical scheme is asserted using the object property skos:boraderTransitive. See Table 9 for the legend of the symbols

Figure 1 illustrates this design schema considering the class bto:AnatomicSite as an example. As reported, each body region is modelled as a named individual of type skos:Concept and bto:AnatomicSite, and the terms’ hierarchical structure is asserted using the object property skos:broaderTransitive. For instance, given that limb is a more general concept than upper extremity, the individual representing the abstract concept limb is connected by the above-mentioned property to the one for upper extremity. As a result, the SKOS data model allows for the storage of the location information without instantiating one individual for each patient but by simply referring to the individual already instantiated as a concept. Note that this approach prevents us from describing the peculiarities of the specific entity. However, such a design principle is employed on components that do not have this requirement, i.e. for each class referring to a set of abstract terms without any associated data or object property. Table 8 reports all classes modelled using the SKOS standard and the corresponding semantic resource of reference. NCIT has been employed as the main reference thesaurus whenever it contained the required concepts. We resorted to other well-known resources otherwise. Within BTO namespace, new concepts are defined only if they refer to terms specific to the domain of interest, and the corresponding concept is not available in the considered resources.

Table 8 List of classes in the Brainteaser Ontology modeled using the SKOS data model. For each class we specify its name and the reference semantic resource we use to define the concepts

To provide a practical example, assume the clinician needs to model the fact that a patient suffered from head trauma. We do not need to refer to the head of the specific patient – and thus define a URI for it –, but we only need to associate the individual referring to the specific patient’s trauma with a generic individual representing the entity head. Note that we instantiate the specific patient’s trauma and assign a URI to it since we are interested in storing specific information related to each trauma. Indeed, the patient’s trauma has some attributes (such as a date) and might have happened in other places besides the head. Therefore, for all patients affected by head trauma, we create an URI for the specific patient’s trauma, and we link it with the object property bto:anatomicalLocation to the URI of the generic concept of head. The same applies to all head traumas. This example is illustrated in Fig. 2.

Fig. 2figure 2

An example of how we model information about a head trauma patient. We show the schema and the individuals involved with a triple table where we report the most important relations. We displayed each triple using the curie notation, particularly, “bto:” stands for elements defined in BTO, “ncit:” refer to the NCI Thesaurus, and “skos:” refers to SKOS namespace. For the sake of readability, we define individual “head” (NCIT:C12419) as ncit:head. See Table 9 for the legend of the symbols

Comments (0)

No login
gif