latentbrief
← Back to concepts

Concept

Autonomous Driving

The technology stack that enables vehicles to navigate roads without human input - combining perception (seeing the world), prediction (anticipating others' behaviour), and planning (deciding how to drive) into a real-time, safety-critical system.

Added May 18, 2026

Autonomous driving represents one of the most complex and consequential AI deployment challenges: a real-time, safety-critical system that must perceive an unpredictable physical environment, predict the behaviour of other agents, and plan safe navigation, all with sub-millisecond latency and essentially zero tolerance for catastrophic failure.

The autonomy stack is conventionally divided into three modules. Perception converts raw sensor data into a structured understanding of the environment: 3D bounding boxes for vehicles, pedestrians, cyclists, and static objects; lane markings and road boundaries; traffic signal states; free-space estimation. Cameras (for high-resolution colour imaging), LiDAR (for accurate 3D range information), and radar (for velocity measurement and operation in rain/fog/darkness) are the standard sensor suite. Sensor fusion combines these complementary modalities into a unified world model.

Prediction forecasts how other agents in the scene - pedestrians, vehicles, cyclists - will behave over the next several seconds. A pedestrian standing at a crosswalk who glances at approaching traffic is likely to continue waiting; one already stepping off the curb is likely to cross. Trajectory prediction models use learned priors about human motion patterns combined with current observations to generate probabilistic forecasts of future positions. Social forces models, graph neural networks (representing the social interactions between nearby agents), and Transformer-based sequential models are all used.

Planning computes a trajectory for the ego vehicle that achieves its goal (reach the destination, follow the route) while satisfying hard constraints (stay on the road, avoid collisions) and soft preferences (prefer the lane centre, avoid aggressive manoeuvres, obey traffic laws). Classical planning methods (lattice planners, sampling-based planners) provide safety guarantees but struggle with the combinatorial complexity of dense urban traffic. Learning-based planners learn from human driving demonstrations and RL but offer weaker formal safety guarantees.

SAE defines six levels of autonomy from 0 (no automation) to 5 (full automation in all conditions). Level 2 (driver assistance with simultaneous control of steering and speed, driver must monitor) is widely deployed (Tesla Autopilot, GM Super Cruise). Level 3 (conditional automation where the system manages all driving tasks but the driver must be available to take over) is approved in limited geographies. Level 4 (full self-driving in specific operational design domains) is operational in geofenced commercial robotaxi deployments (Waymo, Baidu Apollo). Level 5 (full self-driving in all conditions) remains a research challenge.

The long tail of rare, unusual scenarios is the central unsolved challenge: autonomous driving systems must handle not just typical highway and city driving but the infinite variety of edge cases - construction zones, emergency vehicles, unusual road markings, atypical pedestrian behaviour, severe weather. These rare events are precisely what is hardest to train for and most dangerous when mishandled.

Analogy

Imagine hiring a new chauffeur who has never driven before and must learn entirely from sitting in the back seat during your drives, then take over in any traffic condition worldwide. The chauffeur must absorb not just the mechanics of driving but all the unwritten rules, cultural norms, and split-second judgements that experienced drivers make instinctively. This is the task autonomous driving systems face: learning from human demonstrations to handle an environment of infinite variety with safety standards that demand near-zero error rates.

Real-world example

Waymo's autonomous vehicle (operating in Phoenix, San Francisco, and Los Angeles) uses a perception system that processes data from 29 cameras, 5 LiDAR units, and 6 radar units at 100ms intervals. Its prediction module models the intent and likely trajectories of every nearby agent (vehicles, pedestrians, cyclists). Its planning module generates and evaluates thousands of candidate future trajectories, selecting one that minimises risk while progressing toward the destination. All of this runs on custom onboard compute at latency under 100ms - fast enough to react to sudden events at highway speeds.

Why it matters

Autonomous driving is the most demanding real-world AI deployment context: a complex open environment, millisecond latency requirements, safety-critical consequences, and the need to handle rare edge cases reliably. Progress in autonomous driving advances perception, prediction, and planning research across all robotics domains. Understanding the technology stack - and why full autonomy is harder than early timelines suggested - is essential for contextualising AI progress claims and understanding where the fundamental research challenges remain.

In the news

Related concepts