00. Overview
Car Turn Signal Labeling.
Developed a large-scale computer vision pipeline to detect and classify vehicle turn signals
(left, right, hazard, none) from dashcam footage collected during autonomous vehicle mapping
runs in Tartu, Estonia. The system processes 8.8M+ vehicle crops extracted
via YOLO and converts them into temporally tracked sequences for analysis.
Two approaches were evaluated:
- Zero-shot foundation models (GPT-4o / GPT-4o-mini) on static images and temporal grids
- A heuristic model leveraging turn-signal blinking patterns and color
Results demonstrate that turn signal detection is inherently temporal, and
classical signal-based methods currently outperform large vision-language models on this
task.
01. Methodology
Methodology.
- Filtered 8,790,531 YOLO car crops -> 4.7M usable images
- ResNet-18 trained to identify rear-facing vehicles achieved 95.2% validation
accuracy, reducing dataset to 1.38M rear-car crops
Temporal Tracking:
- YOLO detections converted for DeepSORT tracking
- Tracking produced 40,872 tracks (mean length: 26 frames)
- Frames sampled every 4th frame (10 FPS → 2.5 FPS) to capture blinking patterns
Model Approaches:
A. Foundation Models
- GPT-4o / GPT-4o-mini evaluated on single images and temporal grids
- Prompted for JSON output: left, right, hazard, none
B. Heuristic Temporal Model
- Color filtering in HSV space for yellow-orange turn signals
- Time-series brightness analysis with FFT to detect 1-2.5 Hz blinking
- Spatial ROI analysis to distinguish left vs right signals
Results:
A. Foundation Models
- GPT-4o / GPT-4o-mini evaluated on single images and temporal grids
- Prompted for JSON output: left, right, hazard, none
B. Heuristic Temporal Model
- Color filtering in HSV space for yellow-orange turn signals
- Time-series brightness analysis with FFT to detect 1-2.5 Hz blinking
- Spatial ROI analysis to distinguish left vs right signals
02. Results
Results.
Single-Image Foundation Models
| Model |
Accuracy |
| GPT-4o |
~66% |
| GPT-4o-mini |
~40% |
Static images alone are insufficient for reliable turn-signal detection.
Temporal Foundation Models (Image Grids)
| Model |
Accuracy |
None-class Accuracy |
| GPT-4o |
~70% |
0% |
| GPT-4o-mini |
~40% |
0% |
Models tend to over-predict signals due to weak temporal reasoning.
Heuristic Temporal Model (1,010 sequences; 212,919 frames)
| Class |
Accuracy |
| Left |
57% |
| Right |
61% |
| Hazard |
76% |
| None |
83% |
| Overall |
80% |
This heuristic, signal-based method outperformed all foundation model approaches.
Key Takeaway
Turn signals are time-varying signals. Models without explicit
frequency/temporal reasoning fail even with multiple frames. Signal-processing methods
remain more reliable and computationally efficient compared to foundation models. However,
further work will be done with this data to create a model for detecting turn signals.
03. Report
Report PDF.