2024-12-10

Training GNNs in Balance by Dynamic Rescaling

Summary

Graph neural networks exhibiting a rescale invariance, like GATs, obey a conservation law of its parameters, which has been exploited to derive a balanced state that induces good initial trainability. Yet, finite learning rates as used in practice topple the network out of balance during training. This effect is even more pronounced with larger learning rates that tend to induce improved generalization but make the training dynamics less robust. To support even larger learning rates, we propose to dynamically balance the network according to a different criterion, based on relative gradients, that promotes faster and better. In combination with large learning rates and gradient clipping, dynamic rebalancing significantly improves generalization on real-world data. We observe that rescaling provides us with the flexibility to control the order in which network layers are trained. This leads to novel insights into similar phenomena as grokking, which can further boost generalization performance.