## Xiang Cheng : Transformers learn in-context by (functional) gradient descent

- Applied Math and Analysis ( 0 Views )Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a non-linear Transformer can implement functional gradient descent with respect to some RKHS metric, which allows it to learn a broad class of functions in-context. Additionally, we show that the RKHS metric is determined by the choice of attention activation, and that the optimal choice of attention activation depends in a natural way on the class of functions that need to be learned. I will end by discussing some implications of our results for the choice and design of Transformer architectures.

## Hongkai Zhao : Mathematical and numerical understanding of neural networks: from representation to learning dynamics

- Applied Math and Analysis ( 0 Views )In this talk I will present both mathematical and numerical analysis as well as experiments to study a few basic computational issues in using neural network to approximate functions: (1) the numerical error that can be achieved given a finite machine precision, (2) the learning dynamics and computation cost to achieve certain accuracy, and (3) structured and balanced approximation. These issues are investigated for both approximation and optimization in asymptotic and non-asymptotic regimes.

## Sanchit Chaturvedi : Phase mixing in astrophysical plasmas with an external Kepler potential

- Applied Math and Analysis ( 85 Views )In Newtonian gravity, a self-gravitating gas around a massive object such as a star or a planet is modeled via Vlasov Poisson equation with an external Kepler potential. The presence of this attractive potential allows for bounded trajectories along which the gas neither falls in towards the object or escape to infinity. We focus on this regime and prove first a linear phase mixing result in 3D outside symmetry with exact Kepler potential. Then we also prove a long-time nonlinear phase mixing result in spherical symmetry. The mechanism is phenomenologically similar to Landau damping on a torus but mathematically the situation is quite a lot more complex. This is based on an upcoming joint work with Jonathan Luk at Stanford.

## Vakhtang Poutkaradze : Lie-Poisson Neural Networks (LPNets): Data-Based Computing of Hamiltonian Systems with Symmetries

- Applied Math and Analysis ( 57 Views )Physics-Informed Neural Networks (PINNs) have received much attention recently due to their potential for high-performance computations for complex physical systems, including data-based computing, systems with unknown parameters, and others. The idea of PINNs is to approximate the equations and boundary and initial conditions through a loss function for a neural network. PINNs combine the efficiency of data-based prediction with the accuracy and insights provided by the physical models. However, applications of these methods to predict the long-term evolution of systems with little friction, such as many systems encountered in space exploration, oceanography/climate, and many other fields, need extra care as the errors tend to accumulate, and the results may quickly become unreliable. We provide a solution to the problem of data-based computation of Hamiltonian systems utilizing symmetry methods. Many Hamiltonian systems with symmetry can be written as a Lie-Poisson system, where the underlying symmetry defines the Poisson bracket. For data-based computing of such systems, we design the Lie-Poisson neural networks (LPNets). We consider the Poisson bracket structure primary and require it to be satisfied exactly, whereas the Hamiltonian, only known from physics, can be satisfied approximately. By design, the method preserves all special integrals of the bracket (Casimirs) to machine precision. LPNets yield an efficient and promising computational method for many particular cases, such as rigid body or satellite motion (the case of SO(3) group), Kirchhoff's equations for an underwater vehicle (SE(3) group), and others. Joint work with Chris Eldred (Sandia National Lab), Francois Gay-Balmaz (CNRS and ENS, France), and Sophia Huraka (U Alberta). The work was partially supported by an NSERC Discovery grant.