## Hongkai Zhao : Mathematical and numerical understanding of neural networks: from representation to learning dynamics

- Applied Math and Analysis ( 0 Views )In this talk I will present both mathematical and numerical analysis as well as experiments to study a few basic computational issues in using neural network to approximate functions: (1) the numerical error that can be achieved given a finite machine precision, (2) the learning dynamics and computation cost to achieve certain accuracy, and (3) structured and balanced approximation. These issues are investigated for both approximation and optimization in asymptotic and non-asymptotic regimes.

## Xiang Cheng : Transformers learn in-context by (functional) gradient descent

- Applied Math and Analysis ( 0 Views )Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a non-linear Transformer can implement functional gradient descent with respect to some RKHS metric, which allows it to learn a broad class of functions in-context. Additionally, we show that the RKHS metric is determined by the choice of attention activation, and that the optimal choice of attention activation depends in a natural way on the class of functions that need to be learned. I will end by discussing some implications of our results for the choice and design of Transformer architectures.