Professor Chen is the head of the Digital Cancer Screening Research Group at University of Nottingham. She has led the development of PERFORMS for breast screening readers which is fully embedded within the UK’s National Breast Screening Programme. Due to its success, her research has been extended in the UK lung cancer screening programme (PERFECTS) which is the first assessment and training platform that aims to ensure appropriate interpretation of lung CT scans in order for radiologists to benefit patient outcome and streamline clinician workload. Her research interests are in medical imaging, covering cancer screening, early cancer detection, and diagnostic accuracy in CT, breast mammography and tomosynthesis. She’s specifically interested in quality assurance of health professionals and Artificial Intelligence (AI) programmes that interpret medical images, as well as using eye tracking technology and developing AI applications to aid health professionals’ training. Professor Chen is also working on AI evaluation and benchmarking to ensure that AI can be safely implemented into the clinical setting to aid cancer detection, particularly in the screening setting.
Title: Quality assurance of AI: why and how?
Retrospective studies have suggested that AI can achieve or even exceed human readers in cancer detection performance. Prospective evaluation of AI is time-consuming and requires large sample sizes so that the more difficult cases are included in sufficient numbers. An alternative approach is one that offers rapid evaluation against verified cases with established ground truths, that include greater proportions of the more challenging cases, including cancers. This enrichment provides sufficient data required for reliable sensitivity, specificity, and area under the curve analyses. AI developers typically use a large dataset of cases to train their algorithms and subsequently evaluate these using another smaller dataset of cases from the same population. It is assumed that an AI system which performs well on the evaluation dataset will then be able to perform adequately in real life – a process called ‘overfitting’. However, this may not be the case and the overoptimism of the AI to perform safely in a new population may put patients at harm.
The NHS Breast Screening Programme routinely uses a test set external quality assurance (EQA) scheme called Personal Performance in Mammographic Screening (PERFORMS) to assess reader performance. Part of the PERFORMS dataset has already been used to compare the performance of human readers and a commercially available AI algorithm interpreting test sets. The results of this study and an innovative quality assurance approach to the Safety and Effectiveness of AI Applications will be presented at the conference.
Telephone: 01332 227773
Email: bsbr@kc-jones.co.uk