VI · MLOps & InfrastructureAdvanced

Model Registry

A centralised versioned repository for trained ML models that tracks every artifact, its metadata, training lineage, evaluation metrics, and deployment status - the single source of truth for which models exist and where they are running.

Added May 18, 2026 · 3 min read

Without a model registry, ML teams lose track of which model is running in production, cannot reproduce results from specific versions, and have no systematic way to roll back when a new model causes problems. The registry is the foundation of reliable ML operations, enabling the same reproducibility and auditability standards that software engineering applies to code to be applied to model artifacts.

As ML teams move from experimentation to production, they quickly accumulate many model versions. A model registry is the infrastructure component that brings order to this proliferation by providing a centralised store for trained model artifacts with version control, metadata tracking, and lifecycle management.

At its core, a model registry stores model files (weights, architecture configs, tokeniser files, preprocessing code) alongside the metadata needed to understand and reproduce them: training dataset version, hyperparameters, training duration, hardware used, evaluation metrics on standard benchmarks, and the experiment run that produced them. This lineage tracking is essential for compliance, debugging, and reproducibility.

Beyond storage, registries manage the model lifecycle with stages such as Staging (model is trained and evaluated but not yet promoted), Production (the version currently serving live traffic), and Archived (retired versions kept for rollback or compliance). Transitioning a model between stages triggers automated gates - integration tests, performance benchmarks, A/B test criteria - ensuring that no model reaches production without meeting defined quality standards.

The registry also serves as the integration point for the rest of the MLOps pipeline. CI/CD systems query the registry to determine which model version to deploy. Monitoring systems link production performance degradation back to specific model versions. Rollback procedures retrieve a previous production model from the registry when the current version has problems.

Popular model registry implementations include MLflow Model Registry (open source, widely adopted), Weights & Biases Model Registry, and managed offerings from cloud providers (AWS SageMaker Model Registry, Google Vertex AI Model Registry, Azure ML Model Registry). For large language models, Hugging Face Hub functions as a community-wide model registry, providing versioned model cards, evaluation results, and download infrastructure.

A well-designed registry enforces the discipline that no model artifact should exist only on a researcher's local machine or in an informal shared directory. Every model that might influence a production decision should be registered, versioned, and auditable.

Analogy

A pharmaceutical drug registry maintained by a regulatory agency: every approved drug has a formal record of its composition, clinical trial results, approved indications, manufacturing standards, and current market status. No drug can be prescribed without being in the registry, and the registry provides a clear audit trail. A model registry does the same for ML models - formal tracking of what exists, how it performed, and where it is running.

Real-world example

An ML team trains a new version of their recommendation model. They log it to MLflow with the training data version, loss curves, and offline metrics. After automated evaluation passes the quality gate, they promote it to Staging and run a shadow deployment. After A/B testing confirms it outperforms the current Production model, they transition it to Production - which automatically triggers the serving layer to swap the model weights. The previous Production version transitions to Archived but remains accessible for 90-day rollback.

Why it matters

Without a model registry, ML teams lose track of which model is running in production, cannot reproduce results from specific versions, and have no systematic way to roll back when a new model causes problems. The registry is the foundation of reliable ML operations, enabling the same reproducibility and auditability standards that software engineering applies to code to be applied to model artifacts.

In the news

No recent coverage - search for Model Registry.

Related concepts

Continuous Training

The automated process of regularly retraining ML models on fresh data as part of a production ML system - ensuring that models stay current as the world changes rather than degrading on stale distributions.

Experiment Tracking

The practice of systematically recording every ML training run - logging hyperparameters, code versions, datasets, metrics, and artifacts so experiments are reproducible and comparable, turning trial-and-error into a structured search.

Model Serving

The infrastructure layer that takes a trained ML model and makes it available to receive requests, run predictions, and return results at production scale - the bridge between a trained artifact and a live application.

← Back to concepts