Local item independence in large scale international assessments: the Programme for International Student Assessment (PISA) case

Author: Pawel Piotr Skuza

Skuza, Pawel Piotr, 2018 Local item independence in large scale international assessments: the Programme for International Student Assessment (PISA) case, Flinders University, College of Education, Psychology and Social Work

Terms of Use: This electronic version is (or will be) made publicly available by Flinders University in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. You may use this material for uses permitted under the Copyright Act 1968. If you are the owner of any included third party copyright material and/or you believe that any material has been made available without permission of the copyright owner please contact copyright@flinders.edu.au with the details.


International large-scale comparative educational assessments have a 50-year history, and currently, two major international, multidisciplinary, longitudinal, large-scale assessments are implemented: the Trends in International Mathematics and Science Study (TIMSS) and the Programme for International Student Assessment (PISA). PISA, the study of interest in this thesis, focuses on students’ ability to apply their knowledge and it makes greater use of testlets, i.e. a group of items with a common stimulus, than does TIMSS. PISA uses a generalised form of the Rasch model for scaling cognitive data, and this measurement model makes various underlying assumptions. One of these is “local item independence” (LII), meaning that after controlling for students’ ability, items in the assessment should be independent of each other. In practice, independence should be revealed by low residual correlations between items after modelling student ability. All measures are subject to error, and a challenge for educational testing is to minimise that error. Violating the assumption of item independence may increase measurement error, and in this thesis, violations of item independence in PISA are examined.

Three main research aims are investigated in this thesis. The first aim is to describe the testlets used in PISA by providing an overview of items and testlets used across multiple PISA waves. The second research aim is to examine data from PISA’s international calibration samples for the existence of local item dependence (LID) by utilising a non-IRT based LID index namely “Residual Correlation from Factor Analysis”. Meta-analysis is used to combine estimates of LID prevalence across multiple PISA waves, and multilevel logistic regression is used to investigate which item characteristics predict the presence of item dependency. An in-depth investigation of the LID drivers for released items is offered. The cross-wave consistency in LID presence is reported. The third aim is to examine LID within the national calibration level data aiming to identify countries with a higher level of local item dependency. The cross-national consistency of LID existence is investigated along with consideration of the possibility of differential testlet functioning. Greater incidence of LID in some national samples may reveal country-specific causes of LID and they could arise from differences in curriculum and pedagogies or in the administration of the tests.

Results reveal that the reading assessment in PISA makes greater use of multi-item testlets than occurs in the mathematics or science assessments. Single-item testlets are more common in the mathematics assessments than in reading or science. In the investigation of LID, both positive and negative residual correlations were found. Positive residual correlations are expected for items that, for example, share a common prompt, but negative residual correlations are also found and possible causes for both are suggested. Analyses of international calibration data reveal that positive item dependence is as prevalent in mathematics as it is in reading, with science showing less item dependence. Although within-testlet positive LID is present, pairs of items from different testlets also show positive LID and utilising publicly released items allow this between-testlet dependency to be examined and explained. Some testlets exhibit positive LID among the majority of their items, yet other testlets do not indicate within-testlet LID despite having a shared stimulus. Item dependency is shown to be consistently present for some testlets across all PISA waves in which they were used for the purpose of cross-wave linking. Negative dependence is more prevalent than positive LID. Plausible drivers of positive LID are offered for item pairs which come from released items. While it is often assumed that the use of a common item prompt is responsible for LID, multilevel logistic models point to other drivers of positive LID. For example in mathematics, the difference in items’ difficulty or item pair mathematical strand are associated with positive LID. The specific skill of being able to offer a scientific judgement drives some of the dependency apparent in science literacy. While the study offered some signs that selective time and effort allocation could drive a negative LID as suggested by Yen (1993), negative dependency was more likely a mathematical artefact of positive within-testlet LID as proposed by Habing and Roussos (2003). Differences in the prevalence of LID among the 24 investigated OECD countries indicate that Greece more frequently showed high levels of positive within-testlet dependency while between-testlet positive LID was greater for high performing countries such as Finland, Japan and Korea. LID investigations when international and national calibration datasets are used reveals a consistency in dependency between countries for some testlets but also suggests the possibility of differential testlet functioning for others.

Findings from this research are applicable to all educational assessments that use testlets as a part of their cognitive skills testing and for PISA test development teams in particular. Closer consideration should be given to possible within-testlet as well as a between-testlet dependency at the stage of preparing and field testing cognitive items. The need for more research on the effects of LID on cross-wave linking is warranted. Practical implications and suggestions for future research are given.

In conclusion, this research provided evidence that local item independence is violated in PISA and a range of plausible causes are identified. The research has extended the limited literature about LID in PISA to provide a broader perspective utilising data from three cognitive domains and five waves of PISA. The generalisability of findings were strengthened by showing cross-wave and cross national consistency in LID presence.

Keywords: Local item independence, Local item dependence, Programme for International Student Assessment, PISA

Subject: Education thesis

Thesis type: Doctor of Philosophy
Completed: 2018
School: College of Education, Psychology and Social Work
Supervisor: Associate Professor David Curtis