Generative & Multimodal17
CoreAI that creates images, audio, and video - and systems that reason across multiple types of input at once.
3D Gaussian Splatting
A technique that represents 3D scenes as millions of tiny coloured ellipsoids - enabling real-time photorealistic rendering of scenes reconstructed from photographs, much faster than NeRF.
ControlNet
A technique that adds precise structural control to diffusion image generation - letting you specify exactly the composition, pose, or layout of an image through maps, sketches, or depth information.
Diffusion Models
The generative AI technique behind Stable Diffusion and DALL-E 3 - which creates images by learning to reverse a process of gradually adding noise, turning pure static back into coherent pictures.
All concepts
N
Neural Radiance Fields (NeRFs)
A technique that reconstructs full 3D scenes from a set of 2D photographs - letting AI synthesise realistic views of a scene from any angle, including angles never photographed.
Neural Vocoder
The AI component that converts the abstract numerical output of a speech synthesis model into actual playable audio waveforms - the piece responsible for making AI voices sound natural.
S
Sentiment Analysis
The automated identification of emotional tone in text - determining whether a piece of writing expresses positive, negative, or neutral sentiment, and often how strongly.
Speaker Diarization
The process of automatically identifying who is speaking at each moment in an audio recording - answering the question 'who spoke when' without knowing the speakers' identities in advance.
Stable Diffusion
The open-source text-to-image model that made high-quality AI image generation widely accessible - running on consumer hardware and spawning an entire ecosystem of tools and applications.
Style-Prompted TTS
Text-to-speech that lets you control the speaking style through text descriptions or audio references - generating voices that are whispering, excited, formal, or mimicking a specific speaker's cadence.
V
Variational Autoencoder (VAE)
A neural network that learns to compress data into a structured latent space and then reconstruct it - the compression engine that makes latent diffusion models fast enough to run locally.
Vision-Language Models (VLM)
AI systems that understand both images and text together - reading pictures, answering questions about them, describing scenes, and reasoning across visual and linguistic content in a single model.
Voice Cloning
AI technology that can replicate a specific person's voice from a short audio sample - enabling anyone to synthesise speech that sounds like a target speaker.