AI in Traffic Management (HTMS)

At Arya Omnitalk, I worked on a system that processes 30 frames per second from highway cameras, identifies every vehicle, tracks its speed across frames, reads its license plate, and generates an automated fine — all within 200 milliseconds. A car moving at 120 km/h covers 33 meters per second. If our pipeline takes even half a second too long, the vehicle has already left the camera's field of view.

This is the engineering story behind VIDES (Video Incident Detection and Enforcement System), the module I built for the Highway Traffic Management System. It's now live on major expressways, automating over 5,000 challans per month and alerting highway patrol to accidents within 15 seconds.

60% Faster Response

5k+ Monthly Challans

30 FPS Real-time Inference

1. The Need for Speed (and Accuracy)

Detecting a car on a sunny day in clear traffic is easy. Detecting a car in heavy rain, at night, while it's partially hidden behind a truck going 140 km/h? That's an entirely different engineering problem.

                        Core
                            Engineering Challenges
                        Occlusion: Large trucks hide smaller cars for 2–3 seconds. DeepSORT
                                must re-identify vehicles when they reappear.
Motion Blur: At 120 km/h, standard 1/30s shutter speeds blur license
                                plates beyond recognition. We use 1/500s high-speed shutters.
Ghosting: Reflections on wet roads create phantom vehicles in the
                                detector output. We use temporal filtering to suppress false positives.
Night Vision: IR-illuminated cameras introduce color distortion that
                                breaks standard ANPR models trained on daytime data.

                    

2. The VIDES Pipeline

We architected a split pipeline: heavy compute (detection + tracking) runs on edge devices mounted on the highway gantry, while OCR and database operations run in the cloud. This minimizes bandwidth — only violation crops are uploaded, not full video streams.

flowchart TD Cam["📹 High-Speed Camera\n120fps Shutter"] -->|RTSP Stream| Edge subgraph Edge ["⚡ Edge Device - Jetson Xavier"] Detect["🔍 YOLOv8 Detection\nmAP 0.92"] Track["🏷️ DeepSORT Tracking\nUnique Vehicle IDs"] Logic{"⚠️ Violation\nCheck"} Detect --> Track --> Logic end Logic -->|"🚗 Speeding > 120km/h"| Crop["✂️ Crop License Plate"] Logic -->|"💥 Accident Detected"| Alert["🚨 Immediate SOS\nPatrol Alerted < 15s"] Logic -->|"✅ Normal"| Discard["Skip Frame"] subgraph Cloud ["☁️ Cloud Processing"] OCR["📝 Tesseract ANPR\nPlate Recognition"] DB[("📊 Challan Database")] OCR --> DB end Crop -->|Encrypted Upload| OCR

3. YOLO + DeepSORT: The Golden Pair

We evaluated three detection architectures before settling on YOLOv8. The key metric wasn't just accuracy (mAP) — it was inference latency on the Jetson Xavier's GPU.

Model
                            Comparison (Jetson Xavier NX)
                            
                                
                                        Model
                                        mAP@0.5
                                        FPS (Xavier)
                                        Model Size
                                        Verdict
                                    

                                
                                        YOLOv5s
                                        0.88
                                        45
                                        14 MB
                                        Fast but misses small vehicles
                                    

                                        YOLOv8m
                                        0.92
                                        30
                                        52 MB
                                        ✅ Our choice — best balance
                                    

                                        Faster R-CNN
                                        0.94
                                        8
                                        167 MB
                                        Too slow for real-time
                                    

                                        YOLOv8l
                                        0.93
                                        18
                                        83 MB
                                        Marginal accuracy gain, halved FPS
                                    

                            
                        

Model	mAP@0.5	FPS (Xavier)	Model Size	Verdict
YOLOv5s	0.88	45	14 MB	Fast but misses small vehicles
YOLOv8m	0.92	30	52 MB	✅ Our choice — best balance
Faster R-CNN	0.94	8	167 MB	Too slow for real-time
YOLOv8l	0.93	18	83 MB	Marginal accuracy gain, halved FPS

Detection alone can't calculate speed — you need to track the same vehicle across consecutive frames. Enter DeepSORT (Deep Simple Online and Realtime Tracking). It assigns a unique ID to each vehicle using appearance embeddings, surviving brief occlusions.

speed_estimator.py

import numpy as np
from collections import defaultdict

# Vehicle position history: {track_id: [(x, y, timestamp), ...]}
vehicle_history = defaultdict(list)
PIXEL_METER_RATIO = 0.045  # Calibrated per camera installation

def calculate_speed(track_id, bbox_center, timestamp, fps=30):
    """Calculate vehicle speed using positional displacement over time."""
    vehicle_history[track_id].append((bbox_center, timestamp))
    
    # Need at least 5 frames for stable speed estimate
    if len(vehicle_history[track_id]) < 5:
        return None
    
    # Use positions from 5 frames ago to reduce jitter
    prev_pos, prev_time = vehicle_history[track_id][-5]
    curr_pos, curr_time = bbox_center, timestamp
    
    # Euclidean pixel displacement → meters
    dx = curr_pos[0] - prev_pos[0]
    dy = curr_pos[1] - prev_pos[1]
    pixel_distance = np.sqrt(dx**2 + dy**2)
    meter_distance = pixel_distance * PIXEL_METER_RATIO
    
    # Time delta (seconds)
    dt = (curr_time - prev_time) or (5 / fps)
    
    # Convert m/s → km/h
    speed_kmh = (meter_distance / dt) * 3.6
    
    # Sanity check: reject impossible speeds (sensor noise)
    if speed_kmh > 250:  # No civilian car goes 250+ km/h
        return None
    
    return round(speed_kmh, 1)

def check_violation(track_id, speed, speed_limit=120):
    """Flag vehicle if speed exceeds limit for 3+ consecutive frames."""
    if speed and speed > speed_limit:
        vehicle_history[track_id].append(('violation', speed))
        # Require 3 consecutive readings to prevent false positives
        violations = [h for h in vehicle_history[track_id][-3:] 
                      if h[0] == 'violation']
        if len(violations) >= 3:
            return True  # Confirmed violation → trigger ANPR crop
    return False

4. ANPR: Reading License Plates at 120 km/h

Automatic Number Plate Recognition (ANPR) is the most critical — and error-prone — stage. A wrong plate reading means an innocent person gets fined. We use a multi-stage preprocessing pipeline before feeding the crop to Tesseract OCR.

anpr_pipeline.py

import cv2
import pytesseract
import re

def preprocess_plate(crop):
    """Multi-step image enhancement for OCR reliability."""
    # 1. Resize to normalize plate size variations
    crop = cv2.resize(crop, (300, 80), interpolation=cv2.INTER_CUBIC)
    
    # 2. Convert to grayscale
    gray = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
    
    # 3. Adaptive thresholding (handles uneven lighting)
    thresh = cv2.adaptiveThreshold(
        gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
        cv2.THRESH_BINARY, 11, 2
    )
    
    # 4. Morphological opening (removes noise dots)
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
    clean = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
    
    # 5. Deskew: correct rotated plates
    coords = np.column_stack(np.where(clean > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = 90 + angle
    M = cv2.getRotationMatrix2D((150, 40), angle, 1.0)
    deskewed = cv2.warpAffine(clean, M, (300, 80))
    
    return deskewed

def read_plate(crop, confidence_threshold=85):
    """OCR with validation against Indian plate format."""
    processed = preprocess_plate(crop)
    
    # Tesseract with whitelist (only alphanumeric + space)
    config = '--oem 3 --psm 7 -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 '
    result = pytesseract.image_to_data(
        processed, config=config, output_type=pytesseract.Output.DICT
    )
    
    # Extract text with confidence filtering
    text = ' '.join([
        result['text'][i] for i in range(len(result['text']))
        if int(result['conf'][i]) > confidence_threshold
    ]).strip()
    
    # Validate: Indian plates follow pattern XX 00 XX 0000
    pattern = r'^[A-Z]{2}\s?\d{2}\s?[A-Z]{1,3}\s?\d{4}$'
    if re.match(pattern, text.replace(' ', '')):
        return text, True  # Valid plate
    return text, False     # Needs manual review

5. Edge Optimization (TensorRT)

Running YOLOv8 as a native PyTorch model on the Jetson gives us ~12 FPS. That's too slow. We need 30 FPS. The solution: TensorRT conversion with FP16 quantization.

TensorRT Conversion: We exported the PyTorch model to ONNX, then compiled it to a TensorRT engine optimized for the Xavier's Volta GPU architecture. This alone gave a 2.5× speedup.
FP16 Quantization: Reducing inference precision from 32-bit to 16-bit float — with only 0.3% mAP loss — doubled throughput again.
Batch Inference: Processing 4 frames simultaneously for better GPU utilization.
Input Resizing: Resizing from 1080p → 640×640 for inference, keeping the original resolution only for ANPR crops.

Optimization Results

Configuration	FPS	mAP@0.5	Power Draw
PyTorch (FP32)	12	0.920	30W
TensorRT (FP32)	22	0.920	25W
TensorRT (FP16)	30	0.917	20W
TensorRT (INT8)	42	0.891	15W

We chose FP16 over INT8 because the 2.6% mAP drop in INT8 meant missing ~1 in 40 vehicles — unacceptable for enforcement.

6. Accident Detection: Saving Lives in 15 Seconds

The most impactful feature isn't speed enforcement — it's accident detection. On high-speed expressways, a stopped vehicle in a live lane is a death trap. Our system detects stopped/abnormally slow vehicles and alerts highway patrol within 15 seconds.

flowchart LR subgraph Monitor ["🔍 Continuous Monitoring"] Speed{"Vehicle Speed\n< 15 km/h?"} Duration{"Stationary\n> 10 seconds?"} Lane{"In Live Lane\nnot shoulder?"} end Speed -->|Yes| Duration Speed -->|No| Safe["✅ Normal Traffic"] Duration -->|Yes| Lane Duration -->|No| Watch["⏳ Keep Watching"] Lane -->|Yes| SOS["🚨 ALERT!\nPatrol + Ambulance\nDispatched"] Lane -->|No| Shoulder["📋 Log Only\nBreakdown"]

The logic checks three conditions in sequence: (1) speed below 15 km/h, (2) stationary for more than 10 seconds, (3) in a live traffic lane (not the shoulder). This three-stage gate prevents false alerts from toll plaza queues and rest stops.

7. Night Vision & Adverse Weather

Expressways don't sleep. 40% of accidents happen between 10 PM and 6 AM. Standard cameras fail in low-light conditions, so we deployed IR-illuminated cameras with dual-spectrum imaging.

Night Mode

IR illuminators flood the lane with 850nm light (invisible to drivers). Cameras switch to monochrome for maximum sensitivity.

Challenge: ANPR models trained on color images fail on IR greyscale. We retrained the OCR model on 50,000 IR plate images to maintain 93% accuracy at night.

Rain & Fog

Rain creates reflections that generate hundreds of false detections per frame. We apply temporal median filtering across 5 frames to suppress transient artifacts.

Fog Mode: When visibility drops below 200m, we increase the confidence threshold from 0.5 to 0.7 and disable speed enforcement (speed limits are already reduced).

8. Real World Impact

The system has been live on major expressways for over a year. Here are the aggregate results:

Deployment Performance (12 Months)
                            
                                
                                        Metric
                                        Before VIDES
                                        After VIDES
                                        Impact
                                    

                                
                                        Avg. Accident Response Time
                                        8 minutes
                                        3.2 minutes
                                        60% faster
                                    

                                        Monthly Challans Issued
                                        800 (manual)
                                        5,200 (automated)
                                        6.5× increase
                                    

                                        ANPR Accuracy (Day)
                                        —
                                        97.2%
                                        ✅
                                    

                                        ANPR Accuracy (Night)
                                        —
                                        93.1%
                                        ✅
                                    

                                        False Positive Rate
                                        —
                                        1.8%
                                        ✅
                                    

                                        System Uptime
                                        —
                                        99.7%
                                        ✅
                                    

                            
                        

Metric	Before VIDES	After VIDES	Impact
Avg. Accident Response Time	8 minutes	3.2 minutes	60% faster
Monthly Challans Issued	800 (manual)	5,200 (automated)	6.5× increase
ANPR Accuracy (Day)	—	97.2%	✅
ANPR Accuracy (Night)	—	93.1%	✅
False Positive Rate	—	1.8%	✅
System Uptime	—	99.7%	✅

But the number I'm most proud of isn't in a table. In the first 6 months after deployment, the expressway authority reported a 23% reduction in fatal accidents in monitored zones — primarily attributed to the 15-second accident detection alerts and the deterrent effect of automated speed enforcement.

                        Key
                            Takeaways
                        Edge compute saves bandwidth and latency. Processing 30 FPS on-device
                                means only violation crops (a few KB each) are uploaded vs. full video streams (50+
                                Mbps).
FP16 is the sweet spot for enforcement. INT8 is faster but the ~3%
                                accuracy drop means wrongly fining 1 in 40 people — unacceptable.
3-frame confirmation prevents false positives. Requiring speed
                                violations in 3 consecutive frames eliminates jitter from GPS-less pixel-based
                                estimation.
Night mode needs its own training data. Colour models fail on IR
                                greyscale. We needed 50k IR-specific plate images to match daytime accuracy.
The most impactful feature is the simplest. Accident detection
                                (stationary vehicle in live lane) saves more lives than speed enforcement.

                    

AI in Traffic Management: Building VIDES

Contents