MA Thesis: Investigating Bias in AI Algorithms for Breast Cancer Detection from Mammography Imaging: A Focus on Generalization to Unseen Populations

© Shadi Albarqouni

Abstract. Breast density is a critical factor in breast cancer risk and detection, influencing the effectiveness of mammography. Higher breast density, characterized by a greater proportion of fibroglandular tissue relative to fatty tissue, is associated with a four- to sixfold increase in breast cancer risk. This risk is compounded by the fact that dense tissue can mask tumors on mammograms, reducing diagnostic sensitivity. Breast density typically decreases with age, particularly after menopause. However, this relationship is not uniform across all populations.

For instance, the study by Checka et al. (2012) [1] revealed that while 74% of women aged 40-49 had dense breasts, this percentage declined to 57% in women aged 50-59, and further to 36% in women aged 70-79. Notably, a significant portion of older women still possessed dense breasts, challenging the assumption that breast density diminishes uniformly with age. Similarly, Advani et al. (2021) [2] explored breast density among women aged 65 years and older, revealing that even in this older demographic, breast density remains a significant risk factor. The study found that 31.5% of women aged 65-74 had heterogeneously or extremely dense breasts, compared to 30.5% of those aged 75 and older. This persistence of high breast density in older women suggests that age alone is not a sufficient predictor of breast density.

Breast density and its impact on cancer detection vary significantly across different demographic groups. The studies highlight that certain racial and ethnic groups, such as Asian women, are more likely to have dense breasts, which may necessitate different screening protocols. Moreover, body mass index (BMI) also influences breast density, with lower BMI being associated with higher breast density, particularly in older women. For example, Advani et al. (2021) [2] found that 53.5% of women with heterogeneously or extremely dense breasts had a normal BMI, compared to 39.0% of women with scattered fibroglandular densities.

This project aims to investigate any potential biases in deep learning models trained predominantly on data from Western populations—primarily from Europe and the USA— and its generalization on data coming from different demographics. We hypothesize that these models may not adequately account for the variability in breast density across different demographic groups, leading to less accurate risk assessments and detection in non-Western populations.

The integration of deep learning (DL) models into mammography for breast cancer risk prediction has shown significant potential, yet there remains a notable research gap in understanding the implications of demographic variability on model performance. While existing models, such as the Tyrer-Cuzick model, have incorporated risk factors like breast density, these models have limitations, particularly when applied to diverse populations. Many DL models are trained on data from predominantly Western populations, which raises concerns about their generalizability and accuracy in non-Western or minority groups.

Moreover, the phenomenon of automation bias, as highlighted by Dratsch et al. (2023) [3], poses an additional challenge, particularly when AI systems are integrated into the radiology workflow. The risk of radiologists overly relying on AI suggestions without critical engagement is a significant concern, especially when the AI systems themselves may be biased due to the data they were trained on. This bias, coupled with the underrepresentation of diverse demographics in training datasets, underscores a critical gap in the current research landscape.

Given the complexities associated with breast density and its variation across demographics, it is crucial to develop and validate deep learning models that are sensitive to these differences. Such models should be trained on diverse datasets that reflect the global population, ensuring that they can accurately assess breast cancer risk and detect lesions in women from various demographic backgrounds. This approach will help mitigate the biases inherent in current models and improve the effectiveness of breast cancer screening worldwide.

Research Questions:

  • Q1) How does the performance of deep learning models for breast cancer risk prediction vary across different demographic groups, particularly those underrepresented in the training data?
  • Q2) How do breast density and age affect the accuracy of DL-based breast cancer risk prediction models in non-Western populations?
  • Q3) How can AI-induced automation bias be mitigated in mammography to ensure equitable and accurate breast cancer screening across diverse populations?
  • Q4) What strategies can be employed to improve the generalizability of DL models in breast cancer risk prediction across various demographic groups?

Dataset: The landscape of publicly available mammography datasets is rich with resources critical for advancing breast cancer research, particularly in the development of deep learning models. The Digital Database for Screening Mammography (DDSM), from the USA, provides over 10,000 images with detailed lesion annotations and BI-RADS categorizations, though it lacks explicit demographic data. INbreast offers 410 images, from Portugal, with comprehensive annotations, including lesion contours and breast density. The Curated Breast Imaging Subset of DDSM (CBIS-DDSM) refines DDSM data into 3,000 high-quality images with improved relevance for machine learning. The Mammographic Image Analysis Society (MIAS) Database contains 322 images from the UK, categorized by breast density and abnormality type. Lastly, the OPTIMAM Mammography Image Database (OMI-DB) is a vast repository with over 100,000 mammograms from the UK, rich in BI-RADS data and lesion annotations, serving as a vital resource for AI model training.

Besides, we have access to a couple of databases from different demographies, namely, VinDr-Mammo, from Vietnam, which consists of 5,000 four-view exams with breast-level assessment and finding annotations, and the Chinese Mammography Database (CMMD) which consists of 1,775 patients from China with benign or malignant breast disease, and KAUMD, from Saudi Arabia, which provides around 1500 cases with a total of 5600 mammogram images with BI-RADS scores. Besides, through our clinical partners, we have access to 1500 cases from Lebanon and around 100 cases from the United Arab Emirates. All cases have at least a Mammography scan for both left and right sides with multiple views.

The BI-RADS scores are reported and Malignant cases are biopsy proven. These datasets are indispensable for developing robust, generalizable AI models, though they vary in the level of demographic detail provided.

Roadmap:

  • Familiarize yourself with the current literature [3-6]
  • Build the baseline supervised model.
  • Run the necessary comparisons.
  • Run extensive experiments and analysis
  • Write up your thesis

Requirements:

  • Solid background in Machine/Deep Learning
  • Familiar with deep learning models and SOTA architectures
  • Sufficient knowledge of Python programming language and libraries (Scikit-learn)
  • Experience with a mainstream deep learning framework such as PyTorch.
  • Machine/Deep learning hands-on experience

References:

  1. Checka, Cristina M., et al. “The relationship of mammographic density and age: implications for breast cancer screening.” American Journal of Roentgenology 198.3 (2012): W292-W295.
  2. Advani, Shailesh M., et al. “Association of breast density with breast cancer risk among women aged 65 years or older by age group and body mass index.” JAMA network open 4.8 (2021): e2122810-e2122810.
  3. Dratsch, Thomas, et al. “Automation bias in mammography: the impact of artificial intelligence BI-RADS suggestions on reader performance.” Radiology 307.4 (2023): e222176.
  4. Yala, Adam, et al. “A deep learning mammography-based model for improved breast cancer risk prediction.” Radiology 292.1 (2019): 60-66.
  5. Lotter, William, et al. “Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach.” Nature medicine 27.2 (2021): 244-249.
  6. Zufiria, Blanca, et al. “Analysis of potential biases on mammography datasets for deep learning model development.” International Workshop on Applications of Medical AI. Cham: Springer Nature Switzerland, 2022.
  7. Hamidinekoo, Azam, et al. “Deep learning in mammography and breast histology, an overview and future trends.” Medical image analysis 47 (2018): 45-67.

Interested, please contact Prof. Dr. Shadi Albarqouni

Shadi Albarqouni
Shadi Albarqouni
Professor of Computational Medical Imaging Research at University of Bonn | AI Young Investigator Group Leader at Helmholtz AI

Related