Concept
Federated Learning
A machine learning approach that trains models across many distributed devices or data silos without centralising the raw data - each participant trains on their local data and shares only model updates, preserving privacy while enabling collective learning.
Added May 18, 2026
Traditional machine learning requires all training data to be collected and centralised on a single server or data centre. This creates significant privacy risks (centralised data is a single point of breach), regulatory barriers (GDPR, HIPAA, CCPA restrict data transfer across borders and between organisations), and logistical challenges (moving massive datasets across slow or unreliable networks). Federated Learning (FL), introduced by Google in 2017, enables learning from distributed data without centralising it.
The Federated Averaging algorithm describes the basic FL process. A central server initialises a global model and sends it to participating clients (devices, hospitals, organisations). Each client trains the model on its local data for several gradient steps, computing a model update (the difference between the locally trained model and the starting global model, or the gradients themselves). Clients send their updates (not their data) to the server. The server aggregates these updates - typically by weighted averaging proportional to each client's dataset size - to produce an updated global model. The process repeats for many rounds.
FL is now deployed at scale by Google in Gboard (the keyboard app): the next-word prediction model is trained on users' typing patterns on-device, with only model updates (not keystrokes) sent to Google's servers. Apple uses FL for on-device personalisation (Siri improvements, QuickType) without sending user data to Apple. FL allows these companies to improve models using the most relevant possible data - actual user behaviour in the real world - while providing privacy guarantees that centralised data collection would not.
FL introduces several technical challenges beyond centralised learning. Statistical heterogeneity: data across clients is not identically distributed (non-IID). A keyboard FL system trained across many users sees very different text distributions from a teenager's phone versus a doctor's phone versus a non-native English speaker's phone. Standard FL averaging may not converge to a good global model when the local distributions are very different. Communication efficiency: sending large model updates to millions of devices is expensive; gradient compression, quantisation of updates, and structured update parameterisations reduce communication cost. System heterogeneity: clients have different compute capabilities, battery levels, and network connectivity; FL must handle straggling or unavailable clients gracefully.
Privacy analysis of FL reveals that model updates can leak information about local data through gradient inversion attacks - an adversarial server can reconstruct training data from gradient information. Combining FL with differential privacy (adding calibrated noise to model updates before sharing) provides provable privacy guarantees at the cost of some model quality. Secure aggregation protocols use cryptographic techniques (secret sharing, homomorphic encryption) to allow the server to aggregate updates without seeing individual clients' updates.
Analogy
A medical research consortium where hospitals want to collaboratively study a rare disease but cannot share patient records due to privacy regulations. Instead of sending patient files to a central database, each hospital trains a statistical model on its own patients and sends the model parameters (abstract patterns, not patient records) to the coordinating research centre. The centre combines the insights from all hospitals into a shared model that benefits from the collective data without any patient leaving their originating institution.
Real-world example
The MELLODDY project (a pharma industry FL consortium) connected 10 pharmaceutical companies (AstraZeneca, Janssen, Novartis, Pfizer, etc.) to collaboratively train molecular property prediction models without sharing their proprietary compound libraries. Each company trained on their private data; only encrypted model updates were shared. The federated model achieved better predictive performance than any single company's model alone, demonstrating that FL can unlock collaborative learning in inherently competitive domains.
Why it matters
Federated learning is the principal technique for enabling ML in data-sensitive domains: healthcare (hospital consortia, EHR analysis), finance (fraud detection across banks), mobile devices (improving consumer apps without collecting user data), and cross-organisation collaboration in regulated industries. Understanding FL - its mechanism, its challenges (non-IID data, communication cost, privacy of updates), and its applications - is essential for applying machine learning where data governance and privacy are primary constraints.
In the news
Related concepts