At Arya Omnitalk, I worked on a system that processes 30 frames per second from highway cameras, identifies every vehicle, tracks its speed across frames, reads its license plate, and generates an automated fine — all within 200 milliseconds. A car moving at 120 km/h covers 33 meters per second. If our pipeline takes even half a second too long, the vehicle has already left the camera's field of view.
This is the engineering story behind VIDES (Video Incident Detection and Enforcement System), the module I built for the Highway Traffic Management System. It's now live on major expressways, automating over 5,000 challans per month and alerting highway patrol to accidents within 15 seconds.
1. The Need for Speed (and Accuracy)
Detecting a car on a sunny day in clear traffic is easy. Detecting a car in heavy rain, at night, while it's partially hidden behind a truck going 140 km/h? That's an entirely different engineering problem.
Core Engineering Challenges
- Occlusion: Large trucks hide smaller cars for 2–3 seconds. DeepSORT must re-identify vehicles when they reappear.
- Motion Blur: At 120 km/h, standard 1/30s shutter speeds blur license plates beyond recognition. We use 1/500s high-speed shutters.
- Ghosting: Reflections on wet roads create phantom vehicles in the detector output. We use temporal filtering to suppress false positives.
- Night Vision: IR-illuminated cameras introduce color distortion that breaks standard ANPR models trained on daytime data.
2. The VIDES Pipeline
We architected a split pipeline: heavy compute (detection + tracking) runs on edge devices mounted on the highway gantry, while OCR and database operations run in the cloud. This minimizes bandwidth — only violation crops are uploaded, not full video streams.
3. YOLO + DeepSORT: The Golden Pair
We evaluated three detection architectures before settling on YOLOv8. The key metric wasn't just accuracy (mAP) — it was inference latency on the Jetson Xavier's GPU.
Model Comparison (Jetson Xavier NX)
| Model | mAP@0.5 | FPS (Xavier) | Model Size | Verdict |
|---|---|---|---|---|
| YOLOv5s | 0.88 | 45 | 14 MB | Fast but misses small vehicles |
| YOLOv8m | 0.92 | 30 | 52 MB | ✅ Our choice — best balance |
| Faster R-CNN | 0.94 | 8 | 167 MB | Too slow for real-time |
| YOLOv8l | 0.93 | 18 | 83 MB | Marginal accuracy gain, halved FPS |
Detection alone can't calculate speed — you need to track the same vehicle across consecutive frames. Enter DeepSORT (Deep Simple Online and Realtime Tracking). It assigns a unique ID to each vehicle using appearance embeddings, surviving brief occlusions.
import numpy as np
from collections import defaultdict
# Vehicle position history: {track_id: [(x, y, timestamp), ...]}
vehicle_history = defaultdict(list)
PIXEL_METER_RATIO = 0.045 # Calibrated per camera installation
def calculate_speed(track_id, bbox_center, timestamp, fps=30):
"""Calculate vehicle speed using positional displacement over time."""
vehicle_history[track_id].append((bbox_center, timestamp))
# Need at least 5 frames for stable speed estimate
if len(vehicle_history[track_id]) < 5:
return None
# Use positions from 5 frames ago to reduce jitter
prev_pos, prev_time = vehicle_history[track_id][-5]
curr_pos, curr_time = bbox_center, timestamp
# Euclidean pixel displacement → meters
dx = curr_pos[0] - prev_pos[0]
dy = curr_pos[1] - prev_pos[1]
pixel_distance = np.sqrt(dx**2 + dy**2)
meter_distance = pixel_distance * PIXEL_METER_RATIO
# Time delta (seconds)
dt = (curr_time - prev_time) or (5 / fps)
# Convert m/s → km/h
speed_kmh = (meter_distance / dt) * 3.6
# Sanity check: reject impossible speeds (sensor noise)
if speed_kmh > 250: # No civilian car goes 250+ km/h
return None
return round(speed_kmh, 1)
def check_violation(track_id, speed, speed_limit=120):
"""Flag vehicle if speed exceeds limit for 3+ consecutive frames."""
if speed and speed > speed_limit:
vehicle_history[track_id].append(('violation', speed))
# Require 3 consecutive readings to prevent false positives
violations = [h for h in vehicle_history[track_id][-3:]
if h[0] == 'violation']
if len(violations) >= 3:
return True # Confirmed violation → trigger ANPR crop
return False
4. ANPR: Reading License Plates at 120 km/h
Automatic Number Plate Recognition (ANPR) is the most critical — and error-prone — stage. A wrong plate reading means an innocent person gets fined. We use a multi-stage preprocessing pipeline before feeding the crop to Tesseract OCR.
import cv2
import pytesseract
import re
def preprocess_plate(crop):
"""Multi-step image enhancement for OCR reliability."""
# 1. Resize to normalize plate size variations
crop = cv2.resize(crop, (300, 80), interpolation=cv2.INTER_CUBIC)
# 2. Convert to grayscale
gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
# 3. Adaptive thresholding (handles uneven lighting)
thresh = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2
)
# 4. Morphological opening (removes noise dots)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
clean = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
# 5. Deskew: correct rotated plates
coords = np.column_stack(np.where(clean > 0))
angle = cv2.minAreaRect(coords)[-1]
if angle < -45:
angle = 90 + angle
M = cv2.getRotationMatrix2D((150, 40), angle, 1.0)
deskewed = cv2.warpAffine(clean, M, (300, 80))
return deskewed
def read_plate(crop, confidence_threshold=85):
"""OCR with validation against Indian plate format."""
processed = preprocess_plate(crop)
# Tesseract with whitelist (only alphanumeric + space)
config = '--oem 3 --psm 7 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 '
result = pytesseract.image_to_data(
processed, config=config, output_type=pytesseract.Output.DICT
)
# Extract text with confidence filtering
text = ' '.join([
result['text'][i] for i in range(len(result['text']))
if int(result['conf'][i]) > confidence_threshold
]).strip()
# Validate: Indian plates follow pattern XX 00 XX 0000
pattern = r'^[A-Z]{2}\s?\d{2}\s?[A-Z]{1,3}\s?\d{4}$'
if re.match(pattern, text.replace(' ', '')):
return text, True # Valid plate
return text, False # Needs manual review
5. Edge Optimization (TensorRT)
Running YOLOv8 as a native PyTorch model on the Jetson gives us ~12 FPS. That's too slow. We need 30 FPS. The solution: TensorRT conversion with FP16 quantization.
- TensorRT Conversion: We exported the PyTorch model to ONNX, then compiled it to a TensorRT engine optimized for the Xavier's Volta GPU architecture. This alone gave a 2.5× speedup.
- FP16 Quantization: Reducing inference precision from 32-bit to 16-bit float — with only 0.3% mAP loss — doubled throughput again.
- Batch Inference: Processing 4 frames simultaneously for better GPU utilization.
- Input Resizing: Resizing from 1080p → 640×640 for inference, keeping the original resolution only for ANPR crops.
Optimization Results
| Configuration | FPS | mAP@0.5 | Power Draw |
|---|---|---|---|
| PyTorch (FP32) | 12 | 0.920 | 30W |
| TensorRT (FP32) | 22 | 0.920 | 25W |
| TensorRT (FP16) | 30 | 0.917 | 20W |
| TensorRT (INT8) | 42 | 0.891 | 15W |
We chose FP16 over INT8 because the 2.6% mAP drop in INT8 meant missing ~1 in 40 vehicles — unacceptable for enforcement.
6. Accident Detection: Saving Lives in 15 Seconds
The most impactful feature isn't speed enforcement — it's accident detection. On high-speed expressways, a stopped vehicle in a live lane is a death trap. Our system detects stopped/abnormally slow vehicles and alerts highway patrol within 15 seconds.
The logic checks three conditions in sequence: (1) speed below 15 km/h, (2) stationary for more than 10 seconds, (3) in a live traffic lane (not the shoulder). This three-stage gate prevents false alerts from toll plaza queues and rest stops.
7. Night Vision & Adverse Weather
Expressways don't sleep. 40% of accidents happen between 10 PM and 6 AM. Standard cameras fail in low-light conditions, so we deployed IR-illuminated cameras with dual-spectrum imaging.
Night Mode
IR illuminators flood the lane with 850nm light (invisible to drivers). Cameras switch to monochrome for maximum sensitivity.
Challenge: ANPR models trained on color images fail on IR greyscale. We retrained the OCR model on 50,000 IR plate images to maintain 93% accuracy at night.
Rain & Fog
Rain creates reflections that generate hundreds of false detections per frame. We apply temporal median filtering across 5 frames to suppress transient artifacts.
Fog Mode: When visibility drops below 200m, we increase the confidence threshold from 0.5 to 0.7 and disable speed enforcement (speed limits are already reduced).
8. Real World Impact
The system has been live on major expressways for over a year. Here are the aggregate results:
Deployment Performance (12 Months)
| Metric | Before VIDES | After VIDES | Impact |
|---|---|---|---|
| Avg. Accident Response Time | 8 minutes | 3.2 minutes | 60% faster |
| Monthly Challans Issued | 800 (manual) | 5,200 (automated) | 6.5× increase |
| ANPR Accuracy (Day) | — | 97.2% | ✅ |
| ANPR Accuracy (Night) | — | 93.1% | ✅ |
| False Positive Rate | — | 1.8% | ✅ |
| System Uptime | — | 99.7% | ✅ |
But the number I'm most proud of isn't in a table. In the first 6 months after deployment, the expressway authority reported a 23% reduction in fatal accidents in monitored zones — primarily attributed to the 15-second accident detection alerts and the deterrent effect of automated speed enforcement.
Key Takeaways
- Edge compute saves bandwidth and latency. Processing 30 FPS on-device means only violation crops (a few KB each) are uploaded vs. full video streams (50+ Mbps).
- FP16 is the sweet spot for enforcement. INT8 is faster but the ~3% accuracy drop means wrongly fining 1 in 40 people — unacceptable.
- 3-frame confirmation prevents false positives. Requiring speed violations in 3 consecutive frames eliminates jitter from GPS-less pixel-based estimation.
- Night mode needs its own training data. Colour models fail on IR greyscale. We needed 50k IR-specific plate images to match daytime accuracy.
- The most impactful feature is the simplest. Accident detection (stationary vehicle in live lane) saves more lives than speed enforcement.