Send email Copy Email Address
2025-02-11

Towards Faster Decentralized Stochastic Optimization with Communication Compression

Summary

Communication efficiency has garnered significant attention as it is considered the main bottleneck for large-scale decentralized Machine Learning applications in distributed and federated settings. In this regime, clients are restricted to transmitting small amounts of compressed information to their neighbors over a communication graph. Numerous endeavors have been made to address this challenging problem by developing algorithms with compressed communication for decentralized non-convex optimization problems. Despite considerable efforts, current theoretical understandings of the problem are still very limited, and existing algorithms all suffer from various limitations. In particular, these algorithms typically rely on strong, and often infeasible assumptions such as bounded data heterogeneity or require large batch access while failing to achieve linear speedup with the number of clients. In this paper, we introduce MoTEF, a novel approach that integrates communication compression with $\textbf{Mo}$mentum $\textbf{T}$racking and $\textbf{E}$rror $\textbf{F}$eedback. MoTEF is the first algorithm to achieve an asymptotic rate matching that of distributed SGD under arbitrary data heterogeneity, hence resolving a long-standing theoretical obstacle in decentralized optimization with compressed communication. We provide numerical experiments to validate our theoretical findings and confirm the practical superiority of MoTEF.

Conference Paper

International Conference on Learning Representations (ICLR)

Date published

2025-02-11

Date last modified

2025-02-19