The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning. The Fourteenth International Conference on Learning Representations takes place from April 23rd to April 27th 2026 in Rio de Janeiro.
The paper examines how reliable privacy protections are in practice when large language models are adapted using sensitive data. Bartłomiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, and Adam Dziedzic focus on differential privacy, a method intended to theoretically prevent the reconstruction of individual training data from a model.
They systematically evaluate how much information models may still leak. To do so, they simulate attacks that test whether specific data points were part of the training set or can even be extracted directly. A key factor is the similarity between the adaptation data and the original pretraining data.
Their results show that privacy risks increase significantly when both datasets are similar, even without direct overlap. Data from the same distribution is almost as vulnerable as identical training data, while more distinct data leads to substantially lower risks.
The choice of adaptation method also affects protection. Parameter-efficient approaches such as LoRA often perform better than full fine-tuning, especially for strongly different data. Under strict privacy settings, however, these differences become less pronounced. Some methods are also more vulnerable to direct data extraction.
They demonstrate that theoretical guarantees alone are not sufficient to assess privacy in practice and propose a broader evaluation framework that considers both pretraining and adaptation together.
For society, this means that using language models in sensitive domains can be made safer—but only if privacy is not treated solely as a theoretical promise, but is also empirically verified.
Gowtham Reddy Abbavaram, Rajeev Verma, Celia Rubio Madrigal, Krikamol Muandet, and Rebekka Burkholz investigate why boosting methods often handle new, unseen data more robustly than many specialized approaches. The focus is on situations where real-world data change due to hidden factors that are not directly observed during training.
They explain that many existing methods rely on dividing data into predefined groups (“environments”) to learn stable relationships. However, choosing the correct grouping is inherently uncertain, and poor choices can lead models to rely on spurious patterns.
The authors introduce a new concept called “α-predictive sufficiency.” In simple terms, predictions should remain reliable regardless of the underlying environment, even when hidden influences are present. They show theoretically that boosting achieves this by combining many weak models, effectively accounting for multiple possible data partitions instead of relying on a single one.
Experiments on synthetic and real-world datasets support this explanation. Boosting models implicitly capture structures related to hidden factors and produce more stable predictions under changing conditions.
This work improves understanding of why widely used methods perform well in practice and may guide the development of more robust machine learning systems in fields such as healthcare, economics, and science. At the same time, it remains unclear how broadly the findings extend beyond the studied settings.
Yuan Gao, Anton Rodomanov, Jeremy Rack, and Sebastian Stich study a key challenge in large-scale machine learning: the high communication cost between distributed computing nodes. A common solution is to compress transmitted data, but this introduces errors. A widely used correction technique, known as error feedback, works well for simple optimization problems but breaks down in more complex settings where the objective includes additional constraints or regularization terms.
They first explain why existing methods fail in this “composite” setting. The main issue is that once constraints are involved, the structure that allows tracking and correcting accumulated errors no longer holds.
To address this, they propose a new algorithm that combines dual averaging—an approach that accumulates gradient information in the dual space—with a modern error-control mechanism. This restores the structure needed to manage compression errors effectively. They provide the first theoretical analysis showing that, even in the composite case, the method converges at rates comparable to the best-known results from simpler, unconstrained scenarios.
In addition, they introduce a general analytical framework for handling optimization with imperfect or noisy updates. This framework may be useful beyond the specific method studied in the paper. Experimental results support the theory and indicate that the approach performs reliably in practice.
Overall, this research improves the efficiency of distributed machine learning under communication constraints. Its main contribution lies in extending existing techniques to more complicated problem settings, which could help make large-scale AI systems more resource-efficient.
Dariush Wahdany, Matthew Jagielski (Anthropic), Adam Dziedzic, and Franziska Boenisch examine whether data curation truly protects privacy. Modern machine learning increasingly relies on data curation, the careful selection, filtering, and weighting of training data, as a key driver of model quality. Rather than simply scaling up raw data, teams use curation to identify the most useful examples, remove noise, and shape what a model ultimately learns. This has grown into a significant area of industry practice, with dedicated tooling and companies built around curation pipelines for everything from foundation model pretraining to domain-specific fine-tuning. A particularly appealing idea in sensitive domains is to use curation as a privacy shield: rather than training directly on sensitive data such as patient records or financial transactions, the sensitive data is used only to guide the selection of relevant public data. The model itself only ever sees the public data, which intuitively feels much safer.
The authors show that this intuition is wrong. The selection process itself leaks information about the private dataset. They develop a suite of membership inference attacks that can determine whether specific data points were part of the sensitive set, and these attacks succeed at every stage of the pipeline: during data scoring, during subset selection, and even against the final model trained only on public data. A model that has technically never "seen" the private data can still betray its contents through the choices that shaped its training set.
Their experiments span multiple curation methods. Some are more robust than others, but all remain vulnerable, with leakage especially pronounced when the private dataset is small, precisely the regime where curation-based approaches are most attractive. The authors also show that applying differential privacy to the curation step can meaningfully reduce leakage, though with the usual tradeoff against model performance.
The broader takeaway is that privacy analysis needs to extend upstream of model training. Any step that touches sensitive data, including seemingly indirect ones like scoring or filtering, is a potential leakage surface. For applications in healthcare, finance, and other regulated domains, this means that "we didn't train on the sensitive data" is not by itself a sufficient privacy claim. Curation pipelines need their own safeguards.
Mohammad Moshtaghifar (University of British Columbia), Anton Rodomanov, Daniil Vankov (Arizona State University) and Sebastian Stich introduce “DADA,” a new optimization method relevant to machine learning. Such methods are used to efficiently solve mathematical problems, for example when training AI models. A key limitation of existing approaches is the need to manually tune important parameters—especially step sizes—which can be time-consuming and computationally expensive.
DADA addresses this issue by adapting its parameters automatically during the optimization process. It uses information from previous steps and their distance from the starting point to adjust itself dynamically. As a result, it does not require prior knowledge of problem-specific settings. At the same time, it can be applied to a wide range of optimization problems, including both simple and more complex functions with different smoothness properties.
They show theoretically that the method achieves efficiency comparable to specialized algorithms while remaining broadly applicable. Experiments indicate that DADA performs consistently across different scenarios and competes well with existing approaches.
This research contributes to making optimization methods more robust and easier to use. For society, this mainly means more efficient development and lower costs in training AI systems. The contribution is not a dramatic leap in performance, but a practical improvement in usability and general applicability.
Bihe Zhao, Louis Kerner, Michel Meintz, Tameem Bakr, Franziska Boenisch and Adam Dziedzic present a method to determine whether an image was generated by a specific AI model. The work focuses on autoregressive image models, which generate images as sequences of tokens, similar to how language models generate words.
The key insight is that AI-generated images leave a characteristic trace in their internal representation: their features lie closer to the model’s “codebook” entries than those of natural images. They use this property to define two signals that, when combined, indicate whether an image originates from a given model.
The approach works post hoc, meaning it does not require modifying the model or the generation process. Instead, it analyzes a given image and compares its internal structure to patterns typical of a model. This allows not only distinguishing real from generated images, but also identifying which specific model produced them.
In extensive experiments across multiple models and datasets, the method achieves near-perfect accuracy and remains robust even when images are altered through compression or resizing. It is also computationally efficient and requires only a one-time setup.
Overall, the study demonstrates that AI-generated images can be reliably traced without visible watermarks. For society, this offers improved tools to verify the origin and authenticity of images, which may help address misinformation, fraud, and misuse of synthetic media without restricting the technology itself.
Matthias Wilms, Sascha Xu, and Jilles Vreeken present a new approach called “Explainable Mixture Models” (XMM), designed to both model and explain complex data distributions . Traditional mixture models can separate data into subgroups but do not clarify under which conditions these groups arise.
They address this by linking each subgroup to simple, human-interpretable rules based on descriptive features such as age or other attributes. This means each component is not only mathematically defined but also explained through clear conditions, making it easier to understand when specific patterns occur.
A key contribution is a scalable learning method that automatically discovers these rules from data. Unlike many existing approaches, it relies on differentiable optimization, allowing efficient training while avoiding overly complex decision trees or opaque neural networks.
Experiments on synthetic and real-world datasets show that the method achieves strong accuracy while uncovering meaningful subpopulations. Applications range from insurance data to materials science, where it helps reveal relationships between structure and properties.
Overall, the work demonstrates that accuracy and interpretability can be combined. For society, this means more transparent AI systems whose decisions are easier to understand, which can support trust in data-driven applications across fields such as science, medicine, and industry without significantly compromising performance.
Yuki Takezawa (OIST), Anastasia Koloskova (University of Zurich), Xiaowen Jiang and Sebastian Stich study a key challenge in federated learning: how to efficiently train neural networks when data is distributed across many devices and differs significantly between them. They focus on the optimization method “Muon,” which has shown strong performance in centralized training.
First, they demonstrate that a straightforward approach—plugging Muon directly into existing methods such as FedAvg—does not work reliably. The reason is a bias in the underlying optimization step, which prevents the method from converging in realistic settings with heterogeneous data .
To address this, they propose a new method called FEDMUON. It corrects this bias and is proven to converge to a stable solution. They also analyze how approximate computations—used in practice for efficiency—affect performance. Their results show that FEDMUON converges even with inexact calculations and becomes faster as these approximations are improved .
Experiments on standard datasets indicate that FEDMUON achieves higher accuracy than established approaches such as FedAvg and SCAFFOLD across multiple settings . The advantage is especially clear when data differs significantly across devices, which is common in federated learning.
This research contributes to making distributed, privacy-preserving machine learning more efficient and reliable. It is relevant for applications such as smartphones or sensitive domains like healthcare, where data must remain local. At the same time, it represents a methodological advance whose real-world impact will depend on further validation in practical deployments.
Mohamed Ghanem and Bernd Finkbeiner address a core challenge in modern AI: the gap between the true meaning of states in an environment and how neural networks internally represent them. In many reinforcement learning systems, states are encoded as latent embeddings whose evolution over time is not explicitly modeled, which can lead to a mismatch between learned representations and actual environment dynamics.
They propose to make this dynamics explicit by drawing an analogy between decision processes (Markov decision processes) and ordinary differential equations. In both cases, the current state fully determines the next. Based on this idea, they introduce a regularization method that encourages the neural policy to learn latent states that evolve along consistent mathematical “flows,” matching those described by differential equations. This temporal alignment effectively makes the learned representations continuously self-predictive.
The method is incorporated into existing reinforcement learning algorithms, particularly Actor-Critic approaches. Experiments on standard benchmarks, including Atari games and gridworld environments, show that this added structure can largely improve the agent's performance.
Overall, the findings show that temporally aligning learned representations more closely with underlying system dynamics can enhance the effectiveness of sequential decision-making models.
Tom Jacobs, Advait Gadhikar, Celia Rubio-Madrigal, and Rebekka Burkholz study how optimization methods shape the behavior of neural networks during training. A key issue is that some modern techniques encourage sparse models (with fewer active parameters), but at the cost of slower learning.
They introduce a new method called Hyperbolic Aware Minimization (HAM). It combines a standard optimization step with an additional update based on hyperbolic geometry. The goal is to retain the benefits of existing approaches—especially their tendency to produce simpler, better-generalizing models—while avoiding drawbacks like slow convergence.
Their theoretical analysis shows that HAM accelerates learning in critical regions, particularly when parameters are close to zero. This helps models adjust more effectively, including changing parameter signs, which is important for successful training. At the same time, HAM preserves a useful bias toward simpler (sparser) solutions.
Across experiments in computer vision, graph learning, and large language model fine-tuning, HAM consistently improves performance. The gains are especially notable for methods that aim to reduce the number of parameters. The additional computational cost is minimal, and the method integrates easily with existing optimizers.
Overall, the study demonstrates that refining optimization strategies can improve both the efficiency and performance of AI systems. For society, this mainly means that powerful models could be trained with fewer computational resources, which may reduce energy use and costs. However, this represents an incremental methodological improvement rather than a fundamental breakthrough.
Nhi Pham, Artur Jesslen, Bernt Schiele, Adam Kortylewski, and Jonas Fischer present “CAVE,” a method designed to combine two key requirements of modern AI: robustness and interpretability. Many neural networks achieve high accuracy but remain difficult to understand or fail when faced with unfamiliar inputs, such as occlusions or changing environments .
CAVE addresses this by modeling objects not just as 2D image patterns but as simplified 3D volumes. Within these volumes, the system learns a small set of distinct “concepts” that represent parts or properties of an object. This makes it possible to trace which regions of an image contribute to a prediction . At the same time, the model maintains stable performance under changing conditions, including occlusion or environmental shifts.
They also introduce a new evaluation metric that measures whether such concepts remain consistent across viewpoints, without relying on human-labeled object parts . Experiments show that the approach achieves a strong balance between accuracy, robustness, and interpretability compared to existing methods .
The work suggests that high-performing and interpretable AI systems can be combined. This could improve transparency and reliability in safety-critical applications such as healthcare or autonomous driving, while further research is needed to assess its scalability to more complex scenarios.
Ruichen Luo (IST Austria), Sebastian Stich (CISPA), and Krishnendu Chatterjee (IST Austria) study a class of two-player games that lies between classical zero-sum games and more general, computationally difficult games.
A key challenge in this field is that zero-sum games can be solved efficiently, while general-sum games are much harder. They introduce an intermediate class called “near-zero-sum games,” which better reflects real-world situations where small additional effects—such as transaction costs—are present.
Their main contribution is a new algorithm (ICL) that breaks such games into a sequence of simpler zero-sum subproblems. This allows existing efficient methods to be reused instead of directly tackling the harder original problem. Theoretical analysis shows that this approach converges faster to a Nash equilibrium when the game is close to zero-sum.
Experimental results support this: the proposed method requires significantly fewer computational steps in near-zero-sum settings, while standard methods do not benefit from this structure.
Limitations include a focus on specific types of games and primarily two-player settings, and some open questions about optimal efficiency remain.
Overall, the work advances methods in game theory and optimization by showing how structural properties can be exploited for faster computation. Its societal relevance lies in improving tools for applications such as economics, artificial intelligence, and networked systems, without directly presenting immediate practical applications.
Lorenzo Rossi, Bartłomiej Marek, Franziska Boenisch, and Adam Dziedzic study how to audit the privacy of large language models after they have already been trained. Their approach relies on so-called “natural identifiers” (NIDs)—structured random strings such as hashes or wallet addresses that commonly appear in real training data.
They show that these identifiers can be used to generate additional, similar data points that serve as a reference set. This makes it possible to test whether specific data was included in training and to assess how much information a model may reveal—without retraining the model.
Experimental results on open-source language models demonstrate that the method works reliably. It can detect training data with high accuracy while avoiding false positives. At the same time, it requires fewer samples and less computation than prior approaches and provides tighter estimates of privacy guarantees.
They also extend existing analysis techniques by moving from simple yes/no decisions to ranking-based evaluations, which improves statistical power and flexibility.
The authors note that the method has dual-use risks, as it could potentially be misused to better reconstruct training data. However, they argue that such tools are necessary to verify privacy claims and detect misuse.
Overall, the study presents a practical way to make the data usage of large AI models more transparent. For society, this primarily enables stronger oversight by researchers, regulators, and the public. At the same time, responsible use remains essential, as the method can both enhance privacy protection and introduce new risks.
Tom Jacobs, Chao Zhou, Rebekka Burkholz investigates why modern optimization methods such as Adam often outperform classical gradient descent (SGD), especially during fine-tuning of machine learning models. It focuses on how overparameterization (having many model parameters) and different optimization algorithms influence learning outcomes.
The authors analyze a broad class of optimization methods (“steepest descent”) and show that their behavior can be described through a geometric framework called “mirror flow.” This framework determines which types of solutions models tend to favor. A key finding is that different algorithms exhibit distinct implicit biases, for example toward simpler or sparser solutions.
An important result concerns saddle points—flat regions in the optimization landscape where training can stall. The analysis shows that certain methods, especially sign-based approaches related to Adam, can escape these regions more effectively than standard gradient descent with small learning rates. This helps explain their practical advantages.
The study also finds that model depth plays a crucial role: increasing depth can promote sparsity for some methods, while harming performance for others. Experiments with simplified neural networks and real-world scenarios (e.g., vision and language models) support these theoretical predictions.
Overall, the work shows that not only data and architecture matter, but also the choice of optimization algorithm significantly shapes the learned solution.
This research improves understanding of how AI systems are trained and may help practitioners select more efficient and robust training methods, though it does not fundamentally change the limits of current machine learning approaches.
Jan Kociszewski, Hubert Jastrzębski, Tymoteusz Stępkowski, Filip Manijak, Krzysztof Rojek, Franziska Boenisch, and Adam Dziedzic introduce “SERUM,” a new method to reliably mark images generated by AI. The goal is to make it possible to distinguish synthetic images from real ones, helping address issues such as deepfakes or the contamination of training data.
The approach is intentionally simple: during image generation, a subtle noise pattern is added to the initial random input. This pattern persists in the final image. A lightweight detector can later identify it directly, without requiring computationally expensive reconstruction of the generation process.
Compared to prior methods, SERUM is reported to be more robust against common modifications like cropping, compression, or deliberate watermark removal. In experiments, it achieved detection rates above 99% at a low false-positive rate.
Another advantage is efficiency: both embedding and detecting the watermark are fast and require relatively little computation. The method can also assign distinct marks to different users, enabling attribution of generated content.
Importantly, image quality remains largely unchanged. The watermark can also persist to some extent even when marked images are later used to train new models.
Overall, the study presents a practical approach for labeling AI-generated images. For society, this could support greater transparency and help people better interpret digital content, without significantly limiting the use of generative technologies.
The paper introduces “Terminal-Bench,” a new evaluation framework designed to test AI agents on realistic computer-based tasks. These tasks span areas such as data processing, software engineering, and system administration, and are intentionally structured to require multiple steps, decision-making, and error correction.
A key finding is that current AI systems often struggle with such complex, long-horizon tasks. The analysis identifies recurring failure patterns: agents ignore instructions, repeat steps unnecessarily, lose context, or terminate tasks prematurely. A particularly common issue is poor self-verification—either checks are missing or they fail to cover essential correctness criteria.
To better understand these shortcomings, they develop a detailed taxonomy of failure modes. This classification distinguishes between execution errors, reasoning inconsistencies, and inadequate verification practices.
The results also suggest that some failures stem from structural issues, such as ambiguous task definitions or insufficient testing procedures. Automated quality checks can help detect and mitigate these issues early in the process.
Overall, the study highlights that improving AI performance is not only about building stronger models but also about designing better evaluation methods and clearer task specifications. For society, this research contributes to making AI systems more reliable and transparent, particularly in domains like software development and data analysis. At the same time, it provides a realistic assessment of current limitations, especially in handling complex, multi-step tasks.
This paper investigates why neural networks are sensitive to small perturbations in input data (so-called adversarial examples) and provides a theoretical explanation for this behavior. The central concept is “relative sharpness,” a measure describing how much a model’s loss changes when its parameters are slightly perturbed.
Nils Walter, Linara Adilova (Ruhr Universität Bochum), Jilles Vreeken and Michael Kamp (Lamarr) first derive a mathematical expression for the curvature of the loss landscape (the Hessian) in classification models. This analysis shows that sharpness depends on three key factors: the confidence of predictions, the scale of internal feature representations, and the magnitude of the model weights . An important result is that this quantity can be effectively analyzed at the network’s final layer.
They then demonstrate that small perturbations in the input can be represented as corresponding perturbations in the model’s weights . Building on this, they derive an upper bound on how much the loss can increase under such perturbations. This bound is directly controlled by relative sharpness: models with flatter loss landscapes are less sensitive, while sharper models are more vulnerable .
These findings provide a theoretical foundation for a well-known empirical observation: flatter models tend to generalize better and are more robust to adversarial inputs. The paper also clarifies under which conditions common loss functions support or violate these properties.
Overall, the work improves the theoretical understanding of reliability in AI systems. It does not offer an immediate practical solution but identifies key mathematical properties linked to robustness. For society, this represents incremental progress toward more dependable AI applications, particularly in high-stakes domains such as healthcare or autonomous systems, without overstating immediate impact.
The paper investigates why machine learning models often perform worse when applied to new data that differs from their training data. It focuses on “hidden confounding,” meaning unobserved factors that influence both inputs and outcomes and may change across environments.
Gowtham Reddy Abbavaram, Celia Rubio Madrigal, Rebekka Burkholz und Krikamol Muandet show, both theoretically and empirically, that such hidden factors violate key assumptions of many existing methods. Instead of stable, generalizable relationships, models may learn distorted patterns and make unreliable predictions.
A central finding is that relying only on invariant relationships is insufficient. Models must also capture environment-specific relationships to generalize effectively.
Interestingly, the study finds that adding additional variables—even if they are not causally related to the outcome—can improve performance, as long as they carry useful information. These variables can indirectly capture hidden confounders and stabilize predictions.
Experiments on real-world and synthetic datasets support this: including more informative features increases predictive information and can partially mitigate harmful distribution shifts.
The authors also note limitations: the work primarily explains the phenomenon rather than solving it.
Overall, the research improves understanding of why AI systems may fail under changing conditions. For society, it provides a more solid basis for developing more reliable systems in areas such as healthcare, policy, and business, without overstating immediate practical solutions.
Xiaowen Jiang, Anton Rodomanov and Sebastian Stich study how to make distributed learning—especially federated learning—more efficient. In this setting, many devices collaboratively train a model without sharing their raw data. A key challenge is the cost of communication between devices and a central server, which can be slow or unreliable.
They argue that existing ways of comparing optimization methods do not properly account for these communication costs, particularly when different strategies are used to select participating devices. To address this, they introduce a new model that explicitly measures both communication and local computation costs, allowing fairer comparisons between methods.
Building on this model, they propose a new algorithm tailored to non-convex optimization problems. It combines established gradient-based techniques with a new “recursive gradient” approach that reduces estimation error and better exploits similarities across devices’ data. As a result, the method improves both communication efficiency and local computation compared to prior approaches.
They also provide theoretical guarantees and experimental evidence showing that their method often requires less communication while maintaining performance.
This work contributes to making federated learning systems more practical and scalable, for example in mobile or healthcare applications where data privacy is important. Its societal value lies in enabling efficient, privacy-preserving machine learning at scale, although its real-world impact will depend on how these ideas are implemented in practice.
Die Zusammenfassungen wurden mit Hilfe von Künstlicher Intelligenz erstellt.