Metagenomics And The Role Of Prophages In The Human Microbiome

Author: Laura Inglis

Inglis, Laura, 2025 Metagenomics And The Role Of Prophages In The Human Microbiome, Flinders University, College of Science and Engineering

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact copyright@flinders.edu.au with the details.

Abstract

There is still much to be discovered about the virome, the ecosystem of viruses that make up the viral component of the microbiome, especially the lysogenic phages that have integrated themselves into their bacterial hosts. Online databases are a huge boon to scientific discovery, allowing researchers to work with more data from more places than a whole lab could gather; however, freeform metadata fields mean that extensive manual curation can be required, depending on the questions being asked. I took advantage of these large repositories of bacterial DNA sequences to collect bacterial genomes from across the world to examine the prophages found within the human microbiome and what genes they may be providing to their hosts while they are sequestered away within their genomes.

This thesis aims to answer three main questions:

First, how the lysogenic prophages of the human microbiome vary across different areas of the human body. Second, to catalogue which antimicrobial resistance (AMR) genes are found in the prophages of the human microbiome and whether AMR genes are common in prophages or not, and lastly, whether the functional or taxonomic profiles can be used to train a machine learning algorithms to sort metagenomes by their isolation environments and alleviate some of the metadata holes in online databases.

To achieve these aims, I gathered genomes and whole-genome metagenomes from two online genomic databases, GenBank and MGnify, and manually curated tens of thousands of genomes. I analysed them with PhiSPy and AMRfinder+, ran statistical analyses in SPSS, wrote code in R and Python, used the university’s High-performance computing resources, trained and tested several machine learning models on multiple sets of features, and extracted meaning from terabytes of data. I described the analysis in four papers, two of which have been published and two of which are soon to be submitted.

I found that the functional profile was more informative than the taxonomic profile for training a machine learning algorithm and that phage genes were some of the most important functional genes for differentiating isolation environments. Phage genes were found in every environment at different abundances. In the human microbiome, these phages varied on a smaller scale than the metagenomic dataset could show, with even areas of the body that were physically linked having very different amounts of prophage DNA in the bacterial genomes I analysed. The average amount of prophage DNA in a bacterial genome was also affected by more specific aspects of their environment, such as the health of the human host or the geographical region of the world where the sample was isolated from the human host. These factors also impacted the kinds of genes that the prophages were providing to their hosts. While I found that the presence of AMR genes was rarer in the human microbiome than some other studies claim, a large variety and in patterns that suggest that phages are transferring these genes between species.

Keywords: bacteriophage, microbiome, metagenomics, prophage

Subject: Biology thesis

Thesis type: Doctor of Philosophy
Completed: 2025
School: College of Science and Engineering
Supervisor: Robert Edwards