Improving the statistical evaluation of forensic DNA evidence

Author: Duncan Taylor

Taylor, Duncan, 2019 Improving the statistical evaluation of forensic DNA evidence, Flinders University, College of Science and Engineering

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact with the details.


The end of the 20th Century saw, in Australia, the beginning of Forensic DNA profiling for use in criminal investigations and Court proceedings. Compared to modern abilities, DNA profiling, when first introduced, had low sensitivity and low powers of discrimination. The type of forensic samples that could be targeted were typically body fluids (such as semen, saliva or blood) that had abundant (at least by today’s standards) amounts of DNA available. The laboratory hardware and the profiling systems improved with time and became more sensitive, were able to produce informative results quicker, at less cost and with greater discrimination power. These improvements encouraged the forensic community to branch out from the standard body fluid samples and by the late 1990s forensic samples were being taken from what was termed ‘touch DNA’, tiny amounts of DNA left behind on a surface, not from a body fluid, but simply from being transferred when the item was touched. These new samples, coupled with the continually increasing sensitivity of DNA profiling, meant that the DNA profiles became more complex, in terms of the number of contributors and the quality and amount of DNA template.

While a substantial level of resources had been expended on the improvement to the ability to generate a DNA profile, a disproportionately small amount of effort had been put into how best to interpret the results. Starting at the turn of the 21st century, a series of methods were developed that could be used to interpret DNA profile. There were two main branches of interpretation methods that formed, which are commonly referred to as the Likelihood Ratio (LR) method (also called the Bayesian method), which dominated in Europe and Australia, and the inclusion probability method (also called the Random Man Not Excluded, or the frequentist method), which dominated in the USA. In the forensic field today the LR method is generally accepted as the superior method and so is the focus of this thesis.

All LR methods have the same foundation, that is, they seek the ratio of the probability of the observed DNA profile (or multiple profiles) given two competing propositions, which typically align with a prosecution stance and a defence stance. The probabilities are assigned for each proposition by taking a weighted sum of all genotype probabilities that apply under that proposition. The simplest form of the Likelihood Ratio method is known as the binary approach, which weights the genotype probabilities with either a 1 or a 0, i.e. they are either included in the sum or they are not. The binary method typically relies on a subjective assessment of the DNA profile by an analyst, who would be utilising a system of rules and threshold for interpretation. There are many shortcomings of such a system, such as; the very restricted pool of profiles to which it could be applied, the inconsistent application between analysts and the waste of much of the information within the DNA profile (i.e. the intensity of each piece of information and its molecular size).

A more elegant approach to weighting the genotypes within the LR approach was termed the ‘semi-continuous method’. This method weights the genotypes using probabilities associated with events that occur during the process of generating the DNA profile. The semi-continuous method expanded the types of profiles for which a statistical weighting could be applied and was also able to be applied in a more consistent manner. Semi-continuous systems still did not utilise much information from the DNA profile other than the presence or absence of information and so in that regard still had a limited discrimination powers.

This thesis is a compilation of publications that extend the semi-continuous methods of developing a LR to what has been termed ‘fully-continuous’. This is achieved by the use of a much greater level of information from DNA profiles. In order to utilise peaks heights, models have been developed that describe aspects of DNA profile behaviour, that ultimately lead to the patterns of peak intensity seen in a profile. These include models and parameters for stutter, degradation, saturation, peak height variability within and between regions, drop-out and drop-in. For complex DNA profile data, the numbers of combinations that these different parameters can take exceeds the computational ability that would allow an exact solution based on Maximum Likelihood and so a stochastic process using Markov Chain Monte Carlo is developed. The creation of a fully continuous DNA profile interpretation model and a stochastic implementation was trialled of a range of DNA profiles that vary in number of contributors, DNA amounts and degradation levels that might typically be encountered in a Forensic Laboratory.

In addition to the models and systems that allow the deconvolution of complex, mixed DNA profiles, this thesis describes extensions to the LR theory that were developed that allowed a statistical weighting to be provided for the comparison of any reference to virtually any DNA profile. The behaviour of the LR was examined in depth by observing trends in the magnitude of the LR in problems created that varied important factors over a range of plausible values. These trends were aligned with theoretical expectations to judge the performance of the fully continuous system. The system was also extended so that a LR based method could be used to search a database of DNA profiles for either a potential contributor, or a potential relative of a contributor to an unresolvable DNA profile (something that had previously not been possible in the forensic community in Australia).

Methods were developed for calibrating the system to specific laboratories performance so that it provided evidential strengths that were appropriate for the type of data being produced by that specific laboratory. As this concept of expert system calibration, and the concept of a fully continuous system based on a stochastic process, was relatively new in the field of Forensic Biology, some time was spent on validating its performance and instructing others on how they could validate the performance of the system. Validation of the developed fully continuous system was aligned with published guidelines on validation, produced by international advisory bodies on DNA profile interpretation.

A discussion on how the models for deconvolution and LR development could be extended to apply to new situations is provided. Specifically, the deconvolution of DNA profiling data derived from the Y-chromosome (called Y-STR profiling) is shown and the extension of both deconvolution and LR development to consider a range of contributors within the one analysis is given.

To conclude the thesis the work on DNA profile evaluation is placed into a wider case context. This includes a study into the interpretation of the raw electrophoretic data that makes up the DNA profile (and preceding the DNA profile evaluation) and a study into how the support for an individual’s presence or absence from a DNA sample can be considered in conjunction with other case and sample information in order to help address queries of questioned activity.

Keywords: Forensics, DNA profiling, probabilistic genotyping, evidence evaluation

Subject: Forensic & Analytical Chemistry thesis

Thesis type: Doctor of Philosophy
Completed: 2019
School: College of Science and Engineering
Supervisor: Murk Bottema