SAFE-T / Safety Algorithm Fairness Evaluation for Transportation

Durham, NC · Census Tract Level · Strava Metro & StreetLight Data
Team

Prediction Error by Census Tract

Simulated Simulated counter locations with bias patterns from research literature. Real counter data (Strava Metro, StreetLight) requires vendor access.

AI volume prediction error across Durham census tracts. Darker red indicates higher prediction errors, concentrated in low-income areas.

Accuracy by Income Quintile

Simulated Real census demographics with simulated volume predictions. Bias model based on documented Strava/StreetLight accuracy disparities.

Prediction accuracy across income quintiles (Q1=poorest, Q5=richest). Shows mean absolute error in predicted vs actual pedestrian/cyclist counts.

Accuracy by Minority Percentage

Simulated Racial composition from US Census ACS. Volume predictions simulated with demographic-correlated bias patterns from research literature.

Prediction errors grouped by census tract racial composition. Areas with higher minority percentages show systematically worse accuracy.

Predicted vs Actual Volume

Simulated Simulated counter data with demographic-correlated bias. Diagonal line represents perfect prediction; deviations indicate systematic error.

Predicted volumes vs actual counts. Perfect predictions follow the diagonal. Systematic deviations reveal where bias occurs.

Prediction Errors by Quintile

Simulated Simulated prediction errors across all counter locations. Error model calibrated to documented demographic accuracy gaps in volume estimation tools.

Each dot is one counter location. Dots left of zero indicate underprediction by the AI model.

Crash Distribution Map

Real Data NCDOT non-motorist crash locations (pedestrian/bicycle), geocoded to census tracts via spatial join.

Actual vs predicted crash counts by census tract. Toggle between views to compare.

Model Performance by Quintile

Real Data Ridge regression trained on real NCDOT non-motorist crash data (2019-2023), evaluated on 2024. Census demographics as features.

Binary classification (above/below within-quintile median) evaluated per income group. Lower scores in poorer quintiles indicate the model struggles to rank tracts within those areas.

Crashes Over Time

Real Data Non-motorist crash trends from NCDOT ArcGIS Feature Service, aggregated by census tract.

Actual vs predicted crashes over time. Shows persistent over/underprediction patterns by income level.

Prediction Error by Income Quintile

Real Data Mean absolute error of Ridge regression predictions on real NCDOT non-motorist crash data, grouped by neighborhood income quintile.

Relative prediction error by income level. Higher error in poorer quintiles indicates systematic bias in model accuracy.

Infrastructure Recommendations Map

Modeled Project types selected by actual infrastructure gaps (OpenStreetMap feature density per tract). Allocation priority and danger scores are simulated.

Safety project locations from AI vs need-based allocation. Shading shows danger scores; markers show projects. Toggle to compare.

Budget Allocation by Income

Modeled $5M budget allocated across income quintiles. Project types reflect real infrastructure gaps from OpenStreetMap density data.

AI-driven vs need-based safety budget allocation per income quintile.

Equity Comparison: AI vs Need-Based

Modeled Equity metrics comparing AI-driven vs need-based allocation. Infrastructure gaps derived from OpenStreetMap feature density per census tract.

Four normalized equity metrics (0-100) comparing AI-driven and need-based allocation strategies.

Demand Distribution Map

Modeled Infrastructure quality scores derived from OpenStreetMap feature density per census tract. Demand suppression modeled from these real infrastructure conditions.

Suppressed, potential, and actual cycling/walking demand across Durham. High suppression (red) indicates latent demand AI tools miss.

Demand Suppression: Q1 vs Q5

Modeled Funnel comparing potential-to-actual demand conversion by income quintile. Infrastructure quality from OpenStreetMap pedestrian/cyclist feature density.

Demand suppression stages from potential to actual usage. Width represents trip volume at each stage. Q1 areas show severe drop-off.

AI Detection Capability

Modeled AI detection accuracy evaluated against infrastructure scores from OpenStreetMap. Three approaches: naive, sophisticated, and human expert baseline.

Detection accuracy for suppressed demand. Naive AI fails; sophisticated AI achieves partial detection. Neither matches human expert baseline.