International Conference on Machine Learning Workshop(ICML- W)
HORST: Composing Optimizer Geometries for Sparse Transformer Training
International Conference on Machine Learning (ICML)
SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training