Bregman Proximal Gradient for Nonconvex Optimization: When SGD Does Not Have a Valid Proof
Optimization Theory · Journal of Machine Learning Research 26 (2025) 1–44 · 18 min read A team from the National University of Singapore built stochastic Bregman proximal gradient methods that drop the Lipschitz continuity requirement, match the optimal O(ε⁻⁴) sample complexity, and resist gradient explosion on architectures where standard optimizers collapse under large stepsizes or […]










