Author: Susanna Grigson
Grigson, Susanna, 2025 Advancing Microbial Protein Function Prediction, Flinders University, College of Science and Engineering
Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact copyright@flinders.edu.au with the details.
The microbial world forms the foundation of life on Earth, with microorganisms present in every environment and shaping the biosphere at every scale. These activities are underpinned by proteins, the molecular machines encoded within microbial genomes. Bacteria and their associated viruses, known as bacteriophages (phages), contain vast collections of proteins that represent an enormous reservoir of biological diversity and innovation. Despite the generation of petabytes of genomic data through high-throughput DNA sequencing and metagenomics, more than half of all microbial proteins still lack a known biological function. This critical gap between sequence data and protein function represents one of the most significant challenges in modern biology, requiring the development of powerful computational approaches to reveal the hidden functional potential of the microbial world.
This doctoral thesis develops and applies novel computational approaches to predict the functions of bacterial and viral proteins. The work begins with a critical review of existing protein function annotation methods, highlighting the limitations of current homology-based techniques and the potential of language-model and structure-driven approaches. Building on this foundation, the thesis introduces multiple methodological contributions. First, protein language model embeddings are used to evaluate and refine functional ontologies, revealing hidden structure within protein classification schemes and enabling data-driven functional inference of uncharacterised proteins. Second, a deep learning framework named Phynteny is developed, integrating synteny (conserved gene order) with protein language models to improve functional annotation of phage genomes. Phynteny increases functional annotation rates by 14% and demonstrates strong concordance with protein structurebased predictions, demonstrating that genomic context is a powerful signal for functional inference. Third, a structure-based approach is developed to predict the multimeric assembly states of phage proteins. By integrating AlphaFold-Multimer predictions with interface scoring and structure-based clustering, this framework reveals the prevalence and diversity of oligomeric assemblies across thousands of phage proteins. Fourth, structure-guided analysis expands understanding of a decameric coiled-coil barrel architecture previously noted in ϕX174, a well-characterised model bacteriophage. Through systematic analysis of this structure, which I term the "viral churro", this work demonstrates that structural homologues of the churro are distributed across diverse viral families and display properties consistent with genome delivery, revealing an infection mechanism whose broader significance had not been appreciated from sequence-based methods alone.
Together, these contributions expand the microbial protein annotation toolkit by uniting machine learning and structural bioinformatics. The resulting methods improve our capacity to infer the functions of uncharacterised genes and illuminate the functional diversity of microbial life. This work lays the groundwork for deeper insights into microbial ecology, viral evolution, and the design of microbial systems for biotechnology and medicine.
Keywords: protein function prediction, microbial genomics, functional annotation, bacteriophage
Subject: Biological Sciences thesis
Thesis type: Doctor of Philosophy
Completed: 2025
School: College of Science and Engineering
Supervisor: Robert Edwards