FeedbackShare feedback

← All sections·Learning path

III

Generative & Multimodal17

Core

AI that creates images, audio, and video - and systems that reason across multiple types of input at once.

3D Gaussian Splatting

A technique that represents 3D scenes as millions of tiny coloured ellipsoids - enabling real-time photorealistic rendering of scenes reconstructed from photographs, much faster than NeRF.

ControlNet

A technique that adds precise structural control to diffusion image generation - letting you specify exactly the composition, pose, or layout of an image through maps, sketches, or depth information.

Diffusion Models

The generative AI technique behind Stable Diffusion and DALL-E 3 - which creates images by learning to reverse a process of gradually adding noise, turning pure static back into coherent pictures.

All concepts

F

Flow Matching
A newer, simpler alternative to diffusion for generative AI - training models to move data along straight paths between noise and real samples, rather than along the curved random walks diffusion uses.

I

Image Segmentation
Computer vision technology that labels every pixel in an image according to what it belongs to - enabling AI to precisely identify and delineate objects, not just detect them.

M

Multimodal AI
An AI system that can work with more than just text - handling images, audio, and video alongside written language, and reasoning across all of them together.

N

O

Object Detection
Computer vision technology that identifies what objects are in an image and precisely locates each one using bounding boxes - the foundation of visual AI applications.

P

Pose Estimation
Computer vision technology that detects the position of a person's body joints - enabling AI to understand human posture, movement, and gesture from video or images.

S

V