WWe study gradient methods for solving an optimization problem with an (L0,L1)-smooth objective function. This problem class generalizes that of Lipschitz-smooth problems and has gained interest recently, as it captures a broader range of machine learning applications. We provide novel insights on the properties of this function class and develop a general framework for analyzing optimization methods for (L0,L1)-smooth function in a principled manner. While our convergence rate estimates recover existing results for minimizing the gradient norm for nonconvex problems, our approach allows us to significantly improve the current state-of-the-art complexity results in the case of convex problems. We show that both the gradient method with Polyak stepsizes and the normalized gradient method, without any knowledge of the parameters L0 and L1, achieve the same complexity bounds as the method with the knowledge of these constants. In addition to that, we show that a carefully chosen accelerated gradient method can be applied to (L0,L1)-smooth functions, further improving previously known results. In all cases, the efficiency bounds we establish do not have an exponential dependency on L0 or L1, and do not depend on the initial gradient norm.
International Conference on Learning Representations (ICLR)
2024-10-14
2025-05-01