Harness it for novel insights about biology or medicine
Suggest hypotheses for further research
Amounts of data ever-increasing
PubMed contains today over 35 million citations for biomedical literature
Around 20 million in 2011
High-throughput technologies such as microarrays
Massively parallel DNA sequencing
Motivation
Automated Search
Necessary, given the sheer amount of data and information
Searching for strings - sequences of characters - works well
Text documents (.pdf, .docx, etc., also programming code)
Web search engines
Search for terms corresponding to a given string difficult
Synonyms, abbreviations, and acronyms
Motivation
Example
TRAF2 - gene TNF receptor-associated factor 2
Alternative names: TRAP, TRAP3, MGC:45012
Search (e.g. in PubMed) returns differing results
TRAF2 - 1914 results
TNF receptor-associated factor 2 - 1201 results
TRAP - 51719 results (most not related to the gene)
TRAP3 - 7 results
MGC:45012 - 0 results
Motivation
Objective - Semantic 'Dictionary'
Determine vocabulary of terms
Establish alternative names and synonyms
Provide relations between these terms
Bio-Ontologies
Semantic Web
Semantic Web
Semantic Web Vision
Proposed by Tim Berners Lee in 2001
Web of human-readable content (text and pictures - early 2000s)
Intelligent work (selecting, combining, aggregating) delegated to human reader
Make the web more accessible to computers
Semantic Web
Design principles
Make structured and semi-structured data available in standardized formats on the Web
Make not just the datasets, but also the individual data elements and their relations accessible on the Web
Describe the intended semantics of such data in a formalism, so that this intended semantics can be processed by machines
Semantic Web
Semantic Web Initiative
World Wide Web Consortium (W3C)
Development of language standards (such as RDF, RDFS and OWL)
Semantic Web
Basic technologies
Labeled graphs as data model for objects and their relations (using RDF - Ressource Description Framework)
Web identifiers to identify data items and their relations (using URI - Uniform Ressource Identifiers)
Ontologies as data model for the intended semantics of data (using RDF Schema and OWL - Web Ontology Language)
Semantic Web
Linked Open Data (LOD)
Publication of interlinked data for a wide variety of topics and from different stakeholders
Cross-Domain, Geography, Government, Life Sciences, Linguistics, Media, Publications Social Networking
DBpedia, BBC, New York Times Company, Facebook etc.
Linked Open Data Cloud https://lod-cloud.net/
Semantic Web
Linked Open Data Cloud
Semantic Web
Knowledge Graphs
Appeared with Google Knowledge Graph in 2012
Represented through Google Search Engine Results Pages
Over 500 million objects
Data from Freebase/Wikidata, Wikipedia, CIA World Factbook, etc.
Arguably a rebrand of Semantic Web technologies
Semantic network/graph that represents real-world objects and relations between them
Ontologies for formal representation of the entities in the graph
Bio-Ontologies
Ontologies
Ontologies
Origins
Dates back (at least) to Greek Philosophy
Study of the nature of existence
Aristotle developed conceptual taxonomies
The term is due to Jean le Clerc in 1692 - essence of being (On/Ontos) + suffix -logy (study of)
In Computer Science adopted in Knowledge Representation for describing specific domains of reality
Bio-Ontologies
'An ontology is an explicit and formal specification of a conceptualization.' (T.R. Gruber and R. Studer)
Ontologies
Definition
Formal description of a domain of discourse
Finite list of terms and relationships between these terms
Terms represent important concepts (classes of objects) of the domain
Example: university - students, teachers, courses, disciplines, lecture halls, etc.
Ontologies
Relationships in Ontologies
Hierarchies of classes
e.g., undergraduates are students
Ontologies
Relationships in Ontologies
Relations between classes - properties (e.g., faculty members teach courses)
Domain and range of relations (e.g., only faculty members may teach courses)
Disjointness of classes (e.g., faculty and staff are disjoint)
Specifications of relationships between objects (e.g., every department must include at least ten faculty members)
Ontologies
Usage
Shared understanding of a domain for disambiguation
Zip code vs. postal code
Course in university domain - degree or subject?
Mappings to a common ontology or between terminologies
Improving Web Search
Search for concepts in the ontology instead of ambiguous keywords
Search for more general/special information
Decision support systems based on logical deduction
Derive implicit information from explicit one
Explanations for inferences
Bio-Ontologies
Bio-Ontologies
Bio-Ontologies
Characteristics
Describe biomedical research and medical data
Use cases of ontologies in general applicable as well
Additional specific usage for bio-ontologies
Overall success story of ontology usage with a wide variety of developed ontologies
Bio-Ontologies
Data Integration and Search
Bio-ontologies as controlled terminologies for terms and alternative names (recall TRAF2)
Search for a term or synonyms
Example - synonyms in the Gene Ontology
Name
erythrocyte development
Accession
GO:0048821
Synonyms
RBC development
red blood cell development
Definition
The process aimed at ...
Bio-Ontologies
Data Integration and Search
Standard sets of terms and synonyms
Medical Subject Headings (MeSH) for indexing references to medical literature in PubMed
SNOMED CT - systematic nomenclature for medicine
Over 350,000 concepts over clinical findings, procedures, body structure, pharmaceuticals etc.
42 member countries (including Portugal), and many more affiliated
National Institute of Cancer Thesaurus for unified terminology for molecular and clinical cancer research
GoPubMed - search PubMed by indexing PubMed abstracts with Gene Ontology (GO) terms
Retrieve abstracts according to hierarchical GO categories
Bio-Ontologies
Reasoning
Apply inference rules such as transitivity
OBO and OWL provide constructs for specifying inference rules
For quality control of integration/aggregation of data/information
Detect mistakes/inconsistencies
Find matching terms
Discover new facts about data or new relationships
New research directions or new explanations for observations
Bio-Ontologies
Analysis of microarray data
Transcriptional Profiling
Measuring the activity of all genes in the genome under multiple conditions and comparing lists of genes showing differential response (to better understand the biology of the system)
Use the Gene Ontology to find the ontology terms that best characterize the differential genes
Known as overrepresentation analysis/gene category analysis
Bio-Ontologies
Semantic similarity analysis
Measure similarity between terms in an ontology or items annotated by these terms
Meaning of terms
Semantic structure of the ontology
Patterns of annotation
Used for comparing proteins and for clinical diagnostics in medical genetics
Bio-Ontologies
Summary
Challenges in data integration
Semantic Web
Ontologies
Bio-ontologies
Further reading:
Robinson and Bauer, Introduction to Bio-Ontologies, Chapter 1