International Conference on Machine Learning Workshop(ICML- W)
HORST: Composing Optimizer Geometries for Sparse Transformer Training