
EquityNet: Eliminating Racial Bias in AI Skin Cancer Detection
Inquiry Framework
Question Framework
Driving Question
The overarching question that guides the entire project.How can we engineer a fairness-aware computer vision framework that audits and mitigates latent bias in dermatological CNNs to ensure equitable diagnostic reliability across the full Fitzpatrick Scale?Essential Questions
Supporting questions that break down major concepts.- To what extent does the demographic composition of large-scale dermatological datasets (e.g., ISIC) dictate the diagnostic reliability of CNNs for patients with darker skin tones?
- How can mathematical frameworks for algorithmic fairness, such as equalized odds or demographic parity, be integrated into the loss functions of skin lesion classifiers?
- What are the clinical and morphological differences in how skin cancers (like acral lentiginous melanoma) present across the Fitzpatrick Scale, and how can these features be better captured by computer vision architectures?
- How do specific data augmentation techniques, such as GAN-based synthetic data generation or color-space transformations, mitigate the 'hidden stratification' problem in medical AI?
- How can developers implement robust auditing pipelines to identify 'latent bias' in pre-trained models before they are deployed in clinical settings?
- Beyond model accuracy, what ethical and systemic frameworks should guide the deployment of AI to ensure it reduces rather than exacerbates existing racial disparities in healthcare?
Standards & Learning Goals
Learning Goals
By the end of this project, students will be able to:- Quantify and analyze the 'hidden stratification' and demographic bias within large-scale dermatological datasets like the ISIC Archive.
- Develop and integrate mathematical fairness constraints (e.g., equalized odds, demographic parity) into the training loss functions of deep learning models.
- Implement advanced data augmentation strategies, including GAN-based synthetic generation, to balance representation across the Fitzpatrick Skin Type scale.
- Design and execute a comprehensive model auditing pipeline to detect latent bias in pre-trained convolutional neural networks (CNNs).
- Synthesize clinical dermatological knowledge with computer vision architecture to improve the detection of skin cancer variants that present uniquely on darker skin tones.
- Evaluate the ethical implications of medical AI deployment and propose systemic frameworks for reducing racial disparities in healthcare technology.
ABET Student Outcomes (Engineering)
ACM/IEEE CS2023 Curricula (Artificial Intelligence)
ACM/IEEE CS2023 Curricula (Society, Ethics, and Professionalism)
ACM/IEEE CS2023 Curricula (Computer Vision)
Entry Events
Events that will be used to introduce the project to studentsThe Malpractice Mock Trial: AI on Stand
In this immersive scenario, students act as expert technical witnesses in a mock malpractice lawsuit where a CNN-based diagnostic tool missed a melanoma on a patient with Fitzpatrick Type VI skin. They must examine the tool's 'black box' logic and the training data to explain to a 'jury' how systemic bias in code translates to life-threatening clinical errors.The 'Blind Spot' Live Audit
Students are presented with a 'live' demo of a popular open-source skin lesion classifier and asked to test it using images of their own skin or a diverse provided gallery. They will quickly discover a startling discrepancy in accuracy rates between light and dark skin tones, sparking an immediate technical and ethical debate on why the 'math' is failing specific populations.Portfolio Activities
Portfolio Activities
These activities progressively build towards your learning goals, with each submission contributing to the student's final portfolio.The Bias Forensic: Auditing the Black Box
In this foundational activity, students act as 'Forensic Data Scientists' to uncover the hidden stratification within the ISIC (International Skin Imaging Collaboration) dataset. They will perform a quantitative audit of a pre-trained ResNet or Inception model to identify performance disparities across the Fitzpatrick Skin Type scale. This sets the stage by proving that a high 'overall accuracy' can mask catastrophic failures for minority subgroups.Steps
Here is some basic scaffolding to help students complete the activity.Final Product
What students will submit as the final product of the activityA 'Bias Forensic Report' containing demographic heatmaps of the dataset, disaggregated performance metrics (Precision/Recall per Fitzpatrick group), and a visualization of 'latent bias' using t-SNE or UMAP to show how the model clusters skin types.Alignment
How this activity aligns with the learning objectives & standardsAligns with ABET-SO-6 (Experimentation and data interpretation) and ACM-SOCI-03 (Evaluating potential for bias). This activity forces students to use engineering judgment to uncover why a model fails on specific subgroups.The Fairness Architect: Engineering Equitable Loss
Moving from identification to mitigation, students will redesign the model's optimization strategy. They will modify the standard Cross-Entropy loss function to include a 'Fairness Penalty' (such as a Lagrangian multiplier for demographic parity). This activity teaches students that 'fairness' is not just a policy but a mathematical constraint that can be engineered into the core of an algorithm.Steps
Here is some basic scaffolding to help students complete the activity.Final Product
What students will submit as the final product of the activityA Python-based 'Fairness-Aware Training Module' (source code) and a comparison graph showing the trade-off between overall accuracy and subgroup fairness (the 'Pareto Frontier').Alignment
How this activity aligns with the learning objectives & standardsAligns with ABET-SO-1 (Complex engineering problem solving) and ACM-AI-ML-08 (Fairness and accountability in AI). It requires the application of advanced mathematical constraints to traditional ML optimization.The Synthetic Bridge: GANs for Data Representation
One of the primary drivers of bias is the lack of representative data for Fitzpatrick Types V and VI. Students will use Generative Adversarial Networks (GANs) or StyleGAN-based image-to-image translation to synthesize high-fidelity dermatological images that represent darker skin tones. This 'Data Equity' approach focuses on fixing the data rather than just the math.Steps
Here is some basic scaffolding to help students complete the activity.Final Product
What students will submit as the final product of the activityA 'Synthetic Diversity Extension' for the ISIC dataset, including a validation log where students (acting as 'clinical auditors') verify the morphological accuracy of the generated lesions (e.g., ensuring acral lentiginous melanoma features are preserved).Alignment
How this activity aligns with the learning objectives & standardsAligns with ACM-AI-CV-02 (Deep Learning and CNNs) and ABET-SO-6 (Conducting appropriate experimentation). It utilizes state-of-the-art computer vision techniques to solve data scarcity issues.The EquityNet Deployment: From Code to Clinic
In the final phase, students integrate their audited model, fairness loss, and synthetic data into a 'EquityNet' framework. They must then defend their system not just on accuracy, but on its ethical readiness for clinical deployment. They will create a 'Model Card'—a standardized document for reporting AI performance and limitations—to ensure transparency for future medical practitioners.Steps
Here is some basic scaffolding to help students complete the activity.Final Product
What students will submit as the final product of the activityThe 'EquityNet Deployment Portfolio,' comprising the finalized retrained model, a 'Model Card for Healthcare Transparency,' and a 2-page 'Systemic Impact Statement' outlining how this tool reduces racial disparities in healthcare.Alignment
How this activity aligns with the learning objectives & standardsAligns with ABET-SO-4 (Ethical and professional responsibilities) and ACM-AI-ML-08 (FATE). This activity bridges the gap between technical engineering and societal impact.Rubric & Reflection
Portfolio Rubric
Grading criteria for assessing the overall project portfolioEquityNet: Algorithmic Fairness in Dermatology Rubric
Technical Engineering & Auditing
Focuses on the technical execution of auditing and retraining models for algorithmic fairness.Bias Forensic & Metric Analysis
Assessment of the student's ability to quantitatively measure performance disparities across the Fitzpatrick Skin Type scale using advanced statistical metrics and visualization techniques.
Exemplary
4 PointsDemonstrates sophisticated analysis using multiple metrics (Equalized Odds, Disparate Impact) and high-dimensional visualizations (t-SNE/UMAP) that clearly isolate latent bias. Analysis identifies specific morphological features contributing to misclassification.
Proficient
3 PointsProvides a thorough audit using standard performance metrics (Precision/Recall) disaggregated by Fitzpatrick type. Includes clear visualizations that identify performance gaps between light and dark skin tones.
Developing
2 PointsShows emerging ability to calculate subgroup metrics, but visualizations are unclear or interpretation of 'disparate impact' is inconsistent. Identifies basic performance gaps without deep analysis of cause.
Beginning
1 PointsCalculations of subgroup metrics are incomplete or inaccurate. Fails to provide meaningful visualization of how the model clusters or misclassifies different skin types.
Fairness-Aware Model Engineering
Evaluation of the implementation of fairness constraints within the neural network's architecture and training loop, including loss function modification.
Exemplary
4 PointsSuccessfully integrates complex mathematical fairness constraints (e.g., Lagrangian multipliers) into the loss function. Training loop is optimized for FairBatch sampling, and the Pareto Frontier analysis shows sophisticated balancing of accuracy and equity.
Proficient
3 PointsCorrectly modifies the cross-entropy loss to include a fairness penalty. Implementation of FairBatch sampling is functional and leads to measurable improvements in subgroup fairness. Comparison graph is clear.
Developing
2 PointsAttempts to modify the loss function but shows inconsistent application of fairness constraints. Comparison between the baseline and fair-model is present but lacks deep technical interpretation.
Beginning
1 PointsFails to implement a functional fairness-aware training loop. Code contains significant logic errors or the fairness constraint does not impact the model's optimization.
Data Science & Generative AI
Assesses the application of state-of-the-art computer vision to solve systemic data bias.Generative Augmentation & Data Equity
Evaluation of the student's ability to use generative techniques to address data scarcity and ensure the morphological integrity of synthetic medical images.
Exemplary
4 PointsProduces high-fidelity synthetic images using GANs that are indistinguishable from real samples in a Clinical Turing Test. Demonstrates advanced preservation of unique clinical markers (e.g., acral lentiginous features) on dark skin.
Proficient
3 PointsSuccessfully generates representative synthetic images for Fitzpatrick Types V-VI. Synthetic data effectively reduces the representation gap and shows clear improvement in model sensitivity for those groups.
Developing
2 PointsGenerates synthetic images, but they show noticeable artifacts or lack clinical realism. Use of GANs is basic, and the impact on model performance is marginal or inconsistent.
Beginning
1 PointsSynthetic data generation is unsuccessful or produces medically inaccurate representations. Fails to validate the quality of augmented data or its impact on the model.
Ethics, Society, & Professionalism
Evaluates the integration of ethical considerations and professional standards in AI deployment.Ethical Deployment & Transparency
Assessment of the student's ability to translate technical findings into professional, ethical documentation and clinical deployment frameworks.
Exemplary
4 PointsProduces a comprehensive Model Card and Impact Statement that shows a profound understanding of systemic health disparities. Proposes a robust Clinical Deployment Charter with clear, actionable safeguards for human-in-the-loop oversight.
Proficient
3 PointsDevelops a complete Model Card following industry standards (Mitchell et al.). Impact Statement accurately reflects the ethical responsibilities of deploying medical AI and identifies key risks for diverse populations.
Developing
2 PointsModel Card is present but lacks detail on training demographics or limitations. Ethical analysis is superficial and does not fully address how the tool might exacerbate or mitigate racial disparities.
Beginning
1 PointsDocumentation is incomplete or fails to address the ethical implications of the AI system. Does not provide clear guidance on the intended use or limitations of the model in a clinical setting.