MPEG-7-Aligned Spatiotemporal Video Annotation and Scene Interpretation via an Ontological Framework for Intelligent Applications in Medical Image Sequence and Video Analysis

Author: Leslie F. Sikos

Sikos, Leslie F., 2018 MPEG-7-Aligned Spatiotemporal Video Annotation and Scene Interpretation via an Ontological Framework for Intelligent Applications in Medical Image Sequence and Video Analysis, Flinders University, College of Science and Engineering

This electronic version is made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact with the details.


In the past decades, the rapid growth of medical technologies has dramatically increased the amount of multimedia data generated in medical imaging, which urges the development of efficient automated mechanisms for processing video contents. This is a big challenge due to the huge gap between what software agents can obtain from signal processing and what practitioners can comprehend based on cognition, expert knowledge, and experience. Automatically extracted low-level video features typically do not correspond to concepts, such as human organs and medical devices, and procedural events depicted in medical videos. To narrow the Semantic Gap, the depicted concepts and their spatial relations can be described in a machine-interpretable form using formal definitions from structured data resources. Rule-based mechanisms are efficient in describing the temporal information of actions and video events. The fusion of these structured descriptions with textual and, where available, audio descriptors is suitable for the machine-interpretable spatiotemporal annotation of complex medical video scenes. The resulting structured video annotations can be efficiently queried manually or programmatically, and can be used in scene interpretation, video understanding, and content-based video retrieval.

Despite the advantages and potential of Semantic Web technologies in this field, previous approaches failed to exploit rich semantics due to their reliance on proprietary solutions, incorrect and incomplete knowledge-based abstraction of medical domains, and limited temporal and spatial segmentation capacity. Because of the successful semantic implementations seen in the literature for a variety of knowledge domains and applications, there is no doubt that it is worthwhile to apply formal knowledge representation and automated reasoning to medical multimedia resources. Interestingly enough, while medical image semantics have been extensively researched, medical video semantics have long been neglected, and research efforts on video semantics have mainly been focusing on news videos, sports videos, and surveillance videos only.

In this work, novel formalisms and ontologies are proposed to set new directions in standards-aligned semantic video annotation and spatiotemporal reasoning. These can be applied to knowledge domains of practical importance, among which the medical domain has been selected as the primary domain for this thesis, but other domains are also mentioned and demonstrated.

Keywords: Knowledge representation, automated reasoning, ontology engineering, artificial intelligence, image semantics, video semantics, multimedia semantics, MPEG-7, X3D, medical image processing, medical video processing, scene interpretation, content-based video retrieval, medical video analysis

Subject: Computer Science thesis

Thesis type: Doctor of Philosophy
Completed: 2018
School: College of Science and Engineering
Supervisor: Professor David MW Powers