EquityNet: Eliminating Racial Bias in AI Skin Cancer Detection

Created bySANGEETHA M

84 views0 downloads

EquityNet: Eliminating Racial Bias in AI Skin Cancer Detection

College/UniversityComputer Science1 days

Students engineer a fairness-aware computer vision framework to audit and mitigate latent racial bias in dermatological diagnostic models. By analyzing the Fitzpatrick Scale representation in large-scale datasets, participants implement mathematical fairness constraints into CNN loss functions and utilize GANs for synthetic data generation to improve diagnostic reliability for darker skin tones. The project concludes with the development of "Model Cards" and ethical deployment charters designed to ensure equitable healthcare outcomes and transparency in clinical settings.

Algorithmic FairnessComputer VisionMedical ImagingFitzpatrick ScaleDeep LearningGenerative Adversarial NetworksHealthcare Equity

Want to create your own PBL Recipe?Use our AI-powered tools to design engaging project-based learning experiences for your students.

📝

Inquiry Framework

Question Framework

Driving Question

The overarching question that guides the entire project.How can we engineer a fairness-aware computer vision framework that audits and mitigates latent bias in dermatological CNNs to ensure equitable diagnostic reliability across the full Fitzpatrick Scale?

Essential Questions

Supporting questions that break down major concepts.

To what extent does the demographic composition of large-scale dermatological datasets (e.g., ISIC) dictate the diagnostic reliability of CNNs for patients with darker skin tones?
How can mathematical frameworks for algorithmic fairness, such as equalized odds or demographic parity, be integrated into the loss functions of skin lesion classifiers?
What are the clinical and morphological differences in how skin cancers (like acral lentiginous melanoma) present across the Fitzpatrick Scale, and how can these features be better captured by computer vision architectures?
How do specific data augmentation techniques, such as GAN-based synthetic data generation or color-space transformations, mitigate the 'hidden stratification' problem in medical AI?
How can developers implement robust auditing pipelines to identify 'latent bias' in pre-trained models before they are deployed in clinical settings?
Beyond model accuracy, what ethical and systemic frameworks should guide the deployment of AI to ensure it reduces rather than exacerbates existing racial disparities in healthcare?

Standards & Learning Goals

Learning Goals

By the end of this project, students will be able to:

Quantify and analyze the 'hidden stratification' and demographic bias within large-scale dermatological datasets like the ISIC Archive.
Develop and integrate mathematical fairness constraints (e.g., equalized odds, demographic parity) into the training loss functions of deep learning models.
Implement advanced data augmentation strategies, including GAN-based synthetic generation, to balance representation across the Fitzpatrick Skin Type scale.
Design and execute a comprehensive model auditing pipeline to detect latent bias in pre-trained convolutional neural networks (CNNs).
Synthesize clinical dermatological knowledge with computer vision architecture to improve the detection of skin cancer variants that present uniquely on darker skin tones.
Evaluate the ethical implications of medical AI deployment and propose systemic frameworks for reducing racial disparities in healthcare technology.

ABET Student Outcomes (Engineering)

ABET-SO-1

Primary

An ability to identify, formulate, and solve complex engineering problems by applying principles of engineering, science, and mathematics.Reason: The project requires students to solve the complex problem of algorithmic bias in medical imaging using advanced mathematical and CS principles.

ABET-SO-4

Primary

An ability to recognize ethical and professional responsibilities in engineering situations and make informed judgments, which must consider the impact of engineering solutions in global, economic, environmental, and societal contexts.Reason: The core of this project is addressing racial bias and ensuring equitable healthcare outcomes, directly hitting ethical responsibility.

ABET-SO-6

Primary

An ability to develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions.Reason: Students must audit datasets, train models, and interpret performance metrics across diverse demographic subsets.

ACM/IEEE CS2023 Curricula (Artificial Intelligence)

ACM-AI-ML-08

Secondary

Explain the importance of fairness, accountability, transparency, and ethics (FATE) in AI system design and deployment.Reason: The project focuses specifically on fairness and accountability in AI-driven skin cancer detection.

ACM/IEEE CS2023 Curricula (Society, Ethics, and Professionalism)

ACM-SOCI-03

Supporting

Evaluate the social impact of algorithms and the potential for bias in automated decision-making systems.Reason: The project examines how biased algorithms can exacerbate existing racial disparities in medical diagnosis.

ACM/IEEE CS2023 Curricula (Computer Vision)

ACM-AI-CV-02

Primary

Deep Learning and CNNs: Understanding and applying convolutional architectures for image classification tasks.Reason: The technical foundation of the project relies on auditing and retraining CNNs for dermatological image classification.

Entry Events

Events that will be used to introduce the project to students

The Malpractice Mock Trial: AI on Stand

In this immersive scenario, students act as expert technical witnesses in a mock malpractice lawsuit where a CNN-based diagnostic tool missed a melanoma on a patient with Fitzpatrick Type VI skin. They must examine the tool's 'black box' logic and the training data to explain to a 'jury' how systemic bias in code translates to life-threatening clinical errors.

The 'Blind Spot' Live Audit

Students are presented with a 'live' demo of a popular open-source skin lesion classifier and asked to test it using images of their own skin or a diverse provided gallery. They will quickly discover a startling discrepancy in accuracy rates between light and dark skin tones, sparking an immediate technical and ethical debate on why the 'math' is failing specific populations.

📚

Portfolio Activities

These activities progressively build towards your learning goals, with each submission contributing to the student's final portfolio.

Activity 1

The Bias Forensic: Auditing the Black Box

In this foundational activity, students act as 'Forensic Data Scientists' to uncover the hidden stratification within the ISIC (International Skin Imaging Collaboration) dataset. They will perform a quantitative audit of a pre-trained ResNet or Inception model to identify performance disparities across the Fitzpatrick Skin Type scale. This sets the stage by proving that a high 'overall accuracy' can mask catastrophic failures for minority subgroups.

Steps

Here is some basic scaffolding to help students complete the activity.

1. Load the ISIC Archive dataset and a pre-trained CNN skin lesion classifier using PyTorch or TensorFlow.

2. Categorize the dataset images into Fitzpatrick Skin Type groups (I-VI) using metadata or a secondary classification proxy.

3. Run inference on the test set and calculate the 'Equalized Odds' and 'Disparate Impact' ratios for each skin type group.

4. Generate a 'Saliency Map' (using Grad-CAM) for a correctly classified image of light skin versus a misclassified image of dark skin to visualize what features the model is actually 'looking' at.

Final Product

What students will submit as the final product of the activityA 'Bias Forensic Report' containing demographic heatmaps of the dataset, disaggregated performance metrics (Precision/Recall per Fitzpatrick group), and a visualization of 'latent bias' using t-SNE or UMAP to show how the model clusters skin types.

Alignment

How this activity aligns with the learning objectives & standardsAligns with ABET-SO-6 (Experimentation and data interpretation) and ACM-SOCI-03 (Evaluating potential for bias). This activity forces students to use engineering judgment to uncover why a model fails on specific subgroups.

Activity 2

The Fairness Architect: Engineering Equitable Loss

Moving from identification to mitigation, students will redesign the model's optimization strategy. They will modify the standard Cross-Entropy loss function to include a 'Fairness Penalty' (such as a Lagrangian multiplier for demographic parity). This activity teaches students that 'fairness' is not just a policy but a mathematical constraint that can be engineered into the core of an algorithm.

Steps

Here is some basic scaffolding to help students complete the activity.

1. Define a mathematical fairness constraint (e.g., Difference in Equalized Odds) to be added to the loss function.

2. Modify the training loop to calculate the fairness metric at each epoch and backpropagate the weighted loss.

3. Implement a 'FairBatch' sampling strategy that prioritizes underrepresented samples in each training batch to stabilize gradient updates.

4. Compare the results of the 'Fair-CNN' against the baseline model from Activity 1.

Final Product

What students will submit as the final product of the activityA Python-based 'Fairness-Aware Training Module' (source code) and a comparison graph showing the trade-off between overall accuracy and subgroup fairness (the 'Pareto Frontier').

Alignment

How this activity aligns with the learning objectives & standardsAligns with ABET-SO-1 (Complex engineering problem solving) and ACM-AI-ML-08 (Fairness and accountability in AI). It requires the application of advanced mathematical constraints to traditional ML optimization.

Activity 3

The Synthetic Bridge: GANs for Data Representation

One of the primary drivers of bias is the lack of representative data for Fitzpatrick Types V and VI. Students will use Generative Adversarial Networks (GANs) or StyleGAN-based image-to-image translation to synthesize high-fidelity dermatological images that represent darker skin tones. This 'Data Equity' approach focuses on fixing the data rather than just the math.

Steps

Here is some basic scaffolding to help students complete the activity.

1. Analyze the dataset to identify specific 'representation gaps' (e.g., 'Melanoma on Fitzpatrick VI').

2. Apply a color-space transformation or use a pre-trained GAN (like CycleGAN) to translate light-skinned lesion features onto darker skin textures.

3. Perform a 'Clinical Turing Test' where a peer attempts to distinguish between real and synthetic images to ensure quality.

4. Append the synthetic data to the training set and observe the impact on the model's subgroup sensitivity.

Final Product

What students will submit as the final product of the activityA 'Synthetic Diversity Extension' for the ISIC dataset, including a validation log where students (acting as 'clinical auditors') verify the morphological accuracy of the generated lesions (e.g., ensuring acral lentiginous melanoma features are preserved).

Alignment

How this activity aligns with the learning objectives & standardsAligns with ACM-AI-CV-02 (Deep Learning and CNNs) and ABET-SO-6 (Conducting appropriate experimentation). It utilizes state-of-the-art computer vision techniques to solve data scarcity issues.

Activity 4

The EquityNet Deployment: From Code to Clinic

In the final phase, students integrate their audited model, fairness loss, and synthetic data into a 'EquityNet' framework. They must then defend their system not just on accuracy, but on its ethical readiness for clinical deployment. They will create a 'Model Card'—a standardized document for reporting AI performance and limitations—to ensure transparency for future medical practitioners.

Steps

Here is some basic scaffolding to help students complete the activity.

1. Execute the final training pipeline using the optimized loss and augmented dataset.

2. Perform a final 'Stress Test' using a 'blind' diverse dataset to confirm the elimination of latent bias.

3. Draft a 'Model Card' specifying the training data demographics, intended use, and known limitations (as per Mitchell et al., 2019).

4. Develop a 'Clinical Deployment Charter' that proposes how this AI should be used by doctors to augment, rather than replace, human judgment in diverse populations.

Final Product

What students will submit as the final product of the activityThe 'EquityNet Deployment Portfolio,' comprising the finalized retrained model, a 'Model Card for Healthcare Transparency,' and a 2-page 'Systemic Impact Statement' outlining how this tool reduces racial disparities in healthcare.

Alignment

How this activity aligns with the learning objectives & standardsAligns with ABET-SO-4 (Ethical and professional responsibilities) and ACM-AI-ML-08 (FATE). This activity bridges the gap between technical engineering and societal impact.

🏆

Rubric & Reflection

Portfolio Rubric

Grading criteria for assessing the overall project portfolio

EquityNet: Algorithmic Fairness in Dermatology Rubric

Category 1

Technical Engineering & Auditing

Focuses on the technical execution of auditing and retraining models for algorithmic fairness.

Criterion 1

Bias Forensic & Metric Analysis

Assessment of the student's ability to quantitatively measure performance disparities across the Fitzpatrick Skin Type scale using advanced statistical metrics and visualization techniques.

Exemplary

4 Points

Demonstrates sophisticated analysis using multiple metrics (Equalized Odds, Disparate Impact) and high-dimensional visualizations (t-SNE/UMAP) that clearly isolate latent bias. Analysis identifies specific morphological features contributing to misclassification.

Proficient

3 Points

Provides a thorough audit using standard performance metrics (Precision/Recall) disaggregated by Fitzpatrick type. Includes clear visualizations that identify performance gaps between light and dark skin tones.

Developing

2 Points

Shows emerging ability to calculate subgroup metrics, but visualizations are unclear or interpretation of 'disparate impact' is inconsistent. Identifies basic performance gaps without deep analysis of cause.

Beginning

1 Points

Calculations of subgroup metrics are incomplete or inaccurate. Fails to provide meaningful visualization of how the model clusters or misclassifies different skin types.

Criterion 2

Fairness-Aware Model Engineering

Evaluation of the implementation of fairness constraints within the neural network's architecture and training loop, including loss function modification.

Exemplary

4 Points

Successfully integrates complex mathematical fairness constraints (e.g., Lagrangian multipliers) into the loss function. Training loop is optimized for FairBatch sampling, and the Pareto Frontier analysis shows sophisticated balancing of accuracy and equity.

Proficient

3 Points

Correctly modifies the cross-entropy loss to include a fairness penalty. Implementation of FairBatch sampling is functional and leads to measurable improvements in subgroup fairness. Comparison graph is clear.

Developing

2 Points

Attempts to modify the loss function but shows inconsistent application of fairness constraints. Comparison between the baseline and fair-model is present but lacks deep technical interpretation.

Beginning

1 Points

Fails to implement a functional fairness-aware training loop. Code contains significant logic errors or the fairness constraint does not impact the model's optimization.

Category 2

Data Science & Generative AI

Assesses the application of state-of-the-art computer vision to solve systemic data bias.

Criterion 1

Generative Augmentation & Data Equity

Evaluation of the student's ability to use generative techniques to address data scarcity and ensure the morphological integrity of synthetic medical images.

Exemplary

4 Points

Produces high-fidelity synthetic images using GANs that are indistinguishable from real samples in a Clinical Turing Test. Demonstrates advanced preservation of unique clinical markers (e.g., acral lentiginous features) on dark skin.

Proficient

3 Points

Successfully generates representative synthetic images for Fitzpatrick Types V-VI. Synthetic data effectively reduces the representation gap and shows clear improvement in model sensitivity for those groups.

Developing

2 Points

Generates synthetic images, but they show noticeable artifacts or lack clinical realism. Use of GANs is basic, and the impact on model performance is marginal or inconsistent.

Beginning

1 Points

Synthetic data generation is unsuccessful or produces medically inaccurate representations. Fails to validate the quality of augmented data or its impact on the model.

Category 3

Ethics, Society, & Professionalism

Evaluates the integration of ethical considerations and professional standards in AI deployment.

Criterion 1

Ethical Deployment & Transparency

Assessment of the student's ability to translate technical findings into professional, ethical documentation and clinical deployment frameworks.

Exemplary

4 Points

Produces a comprehensive Model Card and Impact Statement that shows a profound understanding of systemic health disparities. Proposes a robust Clinical Deployment Charter with clear, actionable safeguards for human-in-the-loop oversight.

Proficient

3 Points

Develops a complete Model Card following industry standards (Mitchell et al.). Impact Statement accurately reflects the ethical responsibilities of deploying medical AI and identifies key risks for diverse populations.

Developing

2 Points

Model Card is present but lacks detail on training demographics or limitations. Ethical analysis is superficial and does not fully address how the tool might exacerbate or mitigate racial disparities.

Beginning

1 Points

Documentation is incomplete or fails to address the ethical implications of the AI system. Does not provide clear guidance on the intended use or limitations of the model in a clinical setting.

Reflection Prompts

End-of-project reflection questions to get students to think about their learning

Question 1

In Activity 2, you encountered the 'Pareto Frontier'—the mathematical trade-off between overall accuracy and fairness. How did your perspective on 'mathematical neutrality' change after integrating fairness constraints into the loss function, and what does this imply about the responsibility of a software engineer in a clinical setting?

Text

Required

Question 2

How confident do you feel in your ability to conduct a 'Bias Forensic' audit on a pre-trained model and implement specific algorithmic mitigations (e.g., FairBatch sampling or fairness-aware loss functions) in future engineering projects?

Scale

Required

Question 3

Reflecting on the 'Synthetic Bridge' (Activity 3) and 'The Fairness Architect' (Activity 2), which approach do you believe offers the most robust path toward eliminating racial bias in medical AI, and why?

Multiple choice

Required

Options

Data-Centric: Synthetic data (GANs) is the most sustainable way to fix the 'Hidden Stratification' problem.

Algorithm-Centric: Engineering fairness constraints into loss functions is the most reliable way to ensure equity.

Hybrid Approach: Neither is sufficient alone; both data representation and mathematical constraints are required for clinical safety.

Systemic: Technical fixes are secondary to changing how datasets are collected and how clinical deployment is regulated.

Question 4

The 'Model Card' and 'Systemic Impact Statement' in Activity 4 are designed to bridge the gap between code and clinic. How has the process of documenting 'EquityNet' changed your understanding of what constitutes a 'finished' or 'successful' engineering product in the field of healthcare technology?

Text

Required