Transparency Notice — All data sourced from the public NGSIM dataset (FHWA/USDOT). Corridors are in Los Angeles CA, SF Bay Area CA, and Atlanta GA. Risk scores are XGBoost model-predicted probabilities — not live readings.
Risk color: ● High (>0.70) ● Med (0.50–0.70) ● Low (<0.50)
Model-Predicted Risk Scores
Mean XGBoost probability per corridor — computed on all 9,524 windows
Note: US-101 (n=6,647) and I-80 (n=2,281) are the primary data corridors. Lankershim (n=442) and Peachtree (n=154) have smaller samples — scores are real computed values; corridor-level generalization requires caution.
Dataset Quick Facts
Source
NGSIM · FHWA/USDOT · 2005
Corridors
4 (CA × 3, GA × 1)
Window size
5 seconds
Label
Time headway < 1.5s
Features
10 (no label leakage)
Train / Test
80 / 20 stratified split
True Positives — Test Set
1,100
Near-miss correctly flagged · 1,241 actual near-miss in test set
False Positives — Test Set
259
Safe windows wrongly flagged · 664 actual safe in test set
Missed Near-Miss (FN) — Test Set
141
Near-miss not caught by model — most critical error type
ℹ️
Test set source — These three values come from the held-out 20% test split of the full NGSIM dataset: 1,905 windows (1,241 near-miss + 664 safe), never seen during model training. The remaining 7,619 windows (80%) were used for 5-fold cross-validation training. All 9,524 windows are from the NGSIM public dataset (FHWA/USDOT) across 4 corridors: US-101, I-80, Lankershim, Peachtree.
📖 How to read these three numbers
The model was trained on 7,619 windows (80%) and then tested on the remaining 1,905 windows (20%) it had never seen. For each of those 1,905 windows, the model predicted either near-miss or safe — and that prediction was compared against the actual label. There are exactly four possible outcomes:
✅ True Positive — 1,100
Window was actually near-miss and model said near-miss. Correct catch. Out of 1,241 real near-misses in the test set.
⚠️ False Positive — 259
Window was actually safe but model raised an alarm. False alarm. Out of 664 actual safe windows in the test set.
❌ False Negative (FN) — 141 ← most critical
Window was actually near-miss but model said safe. A real danger event the model missed. Out of 1,241 actual near-misses.
✅ True Negative — 405
Window was actually safe and model said safe. Correctly cleared. Out of 664 actual safe windows.
Sanity check: 1,100 + 141 + 259 + 405 = 1,905 total test windows ✅
How the metrics are derived from these four numbers:
Recall 88.6% = 1,100 ÷ 1,241 (TP ÷ all actual near-miss — catch rate)
Specificity 61.0% = 405 ÷ 664 (TN ÷ all actual safe — safe correctly cleared)
Accuracy 79.0% = 1,505 ÷ 1,905 ((TP+TN) ÷ total)
Why FN=141 matters most: In a safety detection system, missing a real near-miss (FN) is far more consequential than a false alarm (FP). The model missed 141 out of 1,241 near-misses — an 11.4% miss rate. Recall (88.6%) is therefore the primary performance metric for this system, not accuracy.
📊
Real model outputs — Risk scores = XGBoost predicted probability for each window. A score of 0.90 means the model is 90% confident that window is a near-miss. High-risk threshold = 0.70.
Predicted Probability Distribution
Test set (1,905 windows) — near-miss vs safe predicted probabilities
Key finding: Near-miss windows cluster at 0.90–1.0 (611 out of 1,241). Safe windows spread more broadly, showing the model's uncertainty on borderline cases.
High-Risk Windows by Corridor
Windows with predicted probability ≥ 0.70 (out of all windows per corridor)
Lankershim shows 100% high-risk but has only 73 windows — not statistically robust on its own.
Corridor Risk Score Summary
Mean predicted probability across ALL windows per corridor — computed on full dataset (training + test)
Corridor
Location
Total Windows
High-Risk (≥0.70)
High-Risk %
Mean Risk Score
Sample Note
US-101
Hollywood Fwy, Los Angeles CA
6,647
4,236
63.7%
0.724
Primary corpus
Lankershim
N. Hollywood, Los Angeles CA
442
351
79.4%
0.821
n=442 — urban surface
Peachtree
Peachtree St, Atlanta GA
154
107
69.5%
0.737
n=154 — small sample
I-80
Emeryville, SF Bay Area CA
2,281
664
29.1%
0.411
2,281 windows
Important caveat: These scores are computed on the full dataset (model saw 80% of each corridor's windows during training). Scores on unseen corridors would likely differ. The test-set ROC AUC (0.864) is the unbiased performance estimate.
🤖
Real computed metrics — 80/20 stratified split · XGBoost primary model · 5-fold CV for generalization check · No label leakage (time-headway features excluded from inputs).
Confusion Matrix — XGBoost (Test Set, n=1,905)
Predicted vs Actual labels on the held-out 20% test set
1,100
✓ True Positive Near-miss correctly caught
259
✗ False Positive Safe flagged as near-miss
141
✗ False Negative Near-miss missed ← critical
405
✓ True Negative Safe correctly identified
80.9%
Precision
88.6%
Recall
61.0%
Specificity
79.0%
Accuracy
Specificity (61.0%) — the model flags 259 out of 664 safe windows as near-miss. Given 65% near-miss class imbalance, some false positives are expected. For safety systems, low FN (141 missed out of 1,241) is the priority; false alarms are more acceptable than missed detections.
ROC Curve AUC = 0.864
True Positive Rate vs False Positive Rate — test set
Precision–Recall Curve AP = 0.924
High AP score reflects strong near-miss detection ability
📏 Baseline Comparison — Does the Model Actually Add Value?
XGBoost vs naive baselines on the same 1,905-window test set
Before trusting any model result, you must ask: could a dumb rule do just as well? Two naive baselines are evaluated on the same 1,905 test windows:
Approach
Accuracy
Recall
Precision
AUC
FN (missed)
Always predict Near-Miss
65.2%
100%
65.2%
0.500
0
Always predict Safe
34.8%
0%
—
0.500
1,241
XGBoost (this model) ✓
79.0%
88.6%
80.9%
0.864
141
“Always Near-Miss” problem:It gets 100% Recall (catches every near-miss) but its Precision is only 65.2% — meaning 1 in 3 alarms are false. Its AUC = 0.5, no better than a coin flip. It cannot discriminate at all.
XGBoost improvement:AUC improves from 0.500 → 0.864 (+72.8% relative gain). Accuracy improves from 65.2% → 79.0%. Most critically, Precision improves from 65.2% → 80.9% — the model raises alarms with real discrimination, not random guessing.
Key takeaway: The model meaningfully outperforms all naive baselines. The AUC of 0.864 represents genuine discriminative ability — the model correctly ranks a near-miss window above a safe window 86.4% of the time across all possible thresholds.
Stable performance: All 5 folds between 0.862 and 0.879 — model is not overfitting to a specific data split.
Model Comparison
All three classifiers on same 80/20 stratified split
Model
AUC
F1
Precision
Recall
XGBoost ✓
0.864
0.846
0.809
0.886
Random Forest
0.863
0.841
0.829
0.853
Logistic Reg.
0.813
0.790
0.841
0.744
XGBoost selected as primary: highest Recall (0.886) means fewest missed near-miss events — the most critical metric for a safety detection system.
Leakage-Excluded Features
min_thmean_thth_frac_critical
Direct time-headway derivatives — including them caused AUC = 1.0 (trivial leakage)
🔬
Real computed values — SHAP values from shap_values.csv. Feature distributions computed from windows.csv. CMV analysis from cmv_flag column.
SHAP Feature Importance
Mean |SHAP| value — XGBoost model · Computed from shap_values.csv
mean_speed
1.379
mean_headway
0.983
min_headway
0.328
std_speed
0.282
lat_std
0.257
std_acc
0.226
max_delta_v
0.165
mean_acc
0.112
mean_delta_v
0.103
cmv_flag
0.005
Feature Means: Near-Miss vs Safe
Mean value per feature grouped by label — computed from windows.csv
Feature Distribution Chart
Near-miss vs safe mean values — top 6 most discriminative features
CMV Analysis — Commercial Vehicle Impact
Near-miss rate: CMV-involved vs non-CMV windows
Group
Total Windows
Near-Miss
Near-Miss Rate
CMV Involved
537
317
59.0%
No CMV
8,987
5,888
65.5%
Key finding: CMV-involved windows have a lower near-miss rate (59.0%) than non-CMV (65.5%). This may reflect CMV drivers maintaining larger following distances due to training/regulation. The difference is more pronounced in this analysis. Real computed result — worth investigating further with a larger dataset.
🛣️
4 real NGSIM corridors — All stats computed from windows.csv and XGBoost predicted probabilities. Risk scores are model outputs, not manual assignments.
US-101 · Hollywood Freeway
Los Angeles, CA · Southbound · 640m
69.8% of data
Near-Miss: 72.4%Risk Score: 0.724CMV: 5.6%n=6,647
12.7
Avg speed (mph)
75.7
Avg headway (ft)
27.8
Min headway (ft)
12.9
Lat std (ft)
23.2
Avg ΔV (ft/s)
374
CMV windows
I-80 · Eastbound Freeway
Emeryville, SF Bay Area, CA
24.0% of data
Near-Miss: 39.6%Risk Score: 0.411CMV: 5.1%n=2,281
7.4
Avg speed (mph)
51.3
Avg headway (ft)
24.5
Min headway (ft)
7.4
Lat std (ft)
8.4
Avg ΔV (ft/s)
117
CMV windows
Lowest risk corridor: Lower speed (7.4 mph) and lower lat_std (7.4 ft) vs US-101 — consistent with the model's lower predicted risk score (0.411).
Lankershim Boulevard
North Hollywood, Los Angeles, CA · Urban surface street
n=442
Near-Miss: 84.2%Risk Score: 0.821CMV: 8.6%n=442
13.1
Avg speed (mph)
98.8
Avg headway (ft)
24.3
Min headway (ft)
20.3
Lat std (ft)
81.6
Avg ΔV (ft/s)
38
CMV windows
High ΔV (81.6 ft/s): Urban stop-and-go on a surface street creates sharp speed changes. Real computed value — not an error. With 442 windows this is more robust than earlier analysis.
Peachtree Street
Atlanta, Georgia · Urban arterial
n=154 · Small Sample
Near-Miss: 74.0%Risk Score: 0.737CMV: 5.2%n=154
11.2
Avg speed (mph)
109.9
Avg headway (ft)
21.9
Min headway (ft)
15.1
Lat std (ft)
69.7
Avg ΔV (ft/s)
8
CMV windows
Limited sample (n=154). Peachtree has 154 windows — real NGSIM data from Atlanta. Stats are computed from actual trajectories; corridor-level generalization still requires caution given the sample size.
Corridor Comparison Chart
Near-miss rate and model risk score side by side
What is NGSIM?
The NGSIM Vehicle Trajectories dataset was collected by FHWA in 2005 using overhead cameras recording at 10 Hz. It provides precise vehicle position, speed, and lane data — the gold standard public benchmark for traffic microsimulation research, cited in hundreds of peer-reviewed studies.
1. Publicly available and fully reproducible. NGSIM is freely distributed by FHWA/USDOT and has been used in 1,000+ peer-reviewed traffic safety studies. Every result in this project can be independently verified and re-run from the same source files.
2. High temporal resolution (10 Hz, vehicle-level). NGSIM records each vehicle's position, speed, and lane every 0.1 seconds. This granularity is what makes precise feature engineering possible — computing speed variance, ΔV, and lateral trajectory spread across a 5-second window requires sub-second sampling that aggregated sensor logs cannot provide.
3. Multi-site geographic diversity. Three California corridors (US-101 Hollywood Fwy, I-80 Emeryville, Lankershim Blvd) plus one Georgia corridor (Peachtree St Atlanta) let the model be tested across different road types, traffic densities, and regional driving patterns — providing a basic check on cross-site generalizability.
4. Establishes a replicable end-to-end baseline. The full pipeline — feature engineering, labeling, XGBoost training, SHAP explainability — is validated here on real data. This baseline is the reference point for future work using other trajectory datasets.
Corridor Context & Data Scope
Transparency statement — what these corridors are and what they are not
⚠ These 4 corridors are NOT active work zones. The NGSIM dataset was collected in 2005 for general traffic flow research — no construction zones, flaggers, or work zone lane closures are present in the data. This is a critical context note for interpreting all results in this study.
What these corridors actually are
Corridor
Road Type
Area
US-101
Urban freeway · 6–8 lanes
Hollywood Fwy, Los Angeles
I-80
Urban freeway · interchange
Emeryville, SF Bay Area
Lankershim
Urban surface arterial
North Hollywood, LA
Peachtree St
Urban arterial
Downtown Atlanta, GA
All four are high-density, congested urban corridors. None contain active work zone events, construction equipment, or temporary lane configurations during data collection.
Why NGSIM is used as a methodology benchmark
No public work zone trajectory dataset exists at 10 Hz vehicle-level resolution. NGSIM is the only freely distributed benchmark of this type, cited in 1,000+ peer-reviewed traffic safety studies and released by FHWA/USDOT as public domain data.
Congested urban conditions share the same physics as work zones. Reduced speeds, tight headways, high ΔV, and lane-change events — the exact dynamics that generate near-miss risk in work zones — are present at high frequency in all 4 NGSIM corridors, making them a valid stand-in for methodology development.
The pipeline is what transfers, not the corridors. The feature engineering → XGBoost → SHAP framework validated here is designed to be applied directly to work zone-specific trajectory data when that data becomes available through future field collection or connected vehicle programs.
Scope of this study: This is a methodology baseline — validating that near-miss risk can be reliably detected from vehicle trajectory features on real, publicly verifiable data. It does not claim these corridors are work zones. Future work will apply this pipeline to trajectory data recorded during active work zone events, which is the intended operational target.
⚙️ Methodology Decisions — Why These Exact Choices?
Two critical parameters that determine what the model sees and learns
1. Why 1.5 seconds as the near-miss threshold?
The 1.5-second time headway threshold is not an arbitrary choice. It is the internationally recognized Surrogate Safety Measure (SSM) established by Hydén (1987) at Lund University, and is formally referenced by FHWA as the standard near-miss criterion in traffic safety research.
< 1.5s
Near-miss — dangerously close following
1.5s – 2.0s
Caution zone — below recommended following
> 2.0s
Safe — within recommended following distance
Why not 1.0s or 2.0s? At <1.0s a driver has essentially no reaction time — that is a crash, not a near-miss. At 2.0s the label would capture normal congested-traffic following that is not genuinely dangerous. The 1.5s point is the peer-reviewed consensus for imminent collision risk without an actual collision. This threshold directly produced the 65.2% near-miss rate observed in this dataset — consistent with stop-and-go urban corridor conditions.
ℹ️ Source: Hydén, C. (1987). The development of a method for traffic safety evaluation. Lund University. Adopted by FHWA as standard SSM criterion.
2. Why 50 frames (5 seconds) per window?
The window size determines how much trajectory context the model sees for each prediction. Too short and the model misses the behavioral build-up before a near-miss. Too long and the window dilutes dangerous moments with safe frames, washing out the signal.
< 2s — Too short
Misses the speed/headway build-up that precedes the near-miss moment. Too noisy.
5s (50 frames) ✓ Selected
Captures the full dangerous interaction sequence. Aligns with standard traffic conflict study duration (4–6s).
> 10s — Too long
A 1.5s near-miss moment gets averaged across 100+ frames of safe driving. Label signal diluted.
Step size = 25 frames (50% overlap): Each window advances by 2.5 seconds, meaning consecutive windows share half their frames. This overlap ensures a near-miss event that spans a window boundary is captured in at least one window — avoiding missed detections due to arbitrary windowing cutpoints. The 50% overlap is the standard choice in sliding-window trajectory analysis.
Research Applications & Impact
Practical use cases enabled by this methodology — applicable to any corridor with trajectory-level sensor data
🚧 Urban Work Zone Monitoring
Flags high-risk 5-second windows near active construction zones using speed, headway, and lateral variance — no crash record required, purely proactive.
🚚 CMV Fleet Safety Programs
CMV-flagged windows can feed fleet safety dashboards, alerting dispatchers when commercial vehicles are consistently present in near-miss conditions on a corridor.
📊 Near-Miss as Crash Surrogate
Crash events are rare and often under-reported. This study uses time headway < 1.5s as a surrogate — near-miss windows are observable in trajectory data before any collision occurs, enabling proactive safety analysis without relying on historical crash records.
⚡ Real-Time Inference Capability
XGBoost predicts each 5-second window in <10ms. This latency is compatible with edge deployment on roadside units or connected vehicle infrastructure for near-real-time risk scoring.
📋 Policy & Corridor Investment Prioritization
Model-predicted risk scores (e.g., US-101 at 0.724 vs I-80 at 0.411) give transportation agencies a ranked, data-driven basis for allocating safety infrastructure investment — speed cameras, dynamic message signs, or increased enforcement — to the corridors where near-miss probability is highest.