Building models for biological, chemical, and physical systems has traditionally relied on domain specific intuition about which interaction and features most strongly influence a system. Statistical methods based in information criteria provide a framework to balance likelihood and model complexity. Recently developed for and applied to dynamical systems, sparse optimization strategies can select a subset of terms from a library that best describe data, automatically interfering model structure. I will discuss my group's application and development of data driven methods for model selection to 1) find simple statistical models to use wastewater surveillance to track the COVID pandemic and 2) recover chaotic systems models from data with hidden variables. I'll briefly discuss current preliminary work and roadblocks in developing new methods for model selection of biological metabolic and regulatory networks.
The three-dimensional higher-order structure of biomolecules
determines their functionality. While assessing primary structure is
fairly easily accessible, reconstruction of higher order structure is
costly. It often requires elaborate correction of atomic clashes,
frequently not fully successful. Using RNA data, we describe a purely
statistical method, learning error correction, drawing power from a
two-scale approach. Our microscopic scale describes single suites by
dihedral angles of individual atom bonds; here, addressing the
challenge of torus principal component analysis (PCA) leads to a
fundamentally new approach to PCA building on principal nested spheres
by Jung et al. (2012). Based on an observed relationship with a
mesoscopic scale, landmarks describing several suites, we use Fréchet
means for angular shape and size-and-shape, correcting
within-suite-backbone-to-backbone clashes. We validate this method by
comparison to reconstructions obtained from simulations approximating
biophysical chemistry and illustrate its power by the RNA example of
This is joint work with Benjamin Eltzner, Kanti V. Mardia and Henrik Wiechers.
Eltzner, B., Huckemann, S. F., Mardia, K. V. (2018): Torus principal component analysis with applications to RNA structure. Ann. Appl. Statist. 12(2), 1332?1359.
Jung, S., Dryden, I. L., Marron, J. S. (2012): Analysis of principal nested spheres. Biometrika, 99 (3), 551-568
Mardia, K. V., Wiechers, H., Eltzner, B., Huckemann, S. F. (2022). Principal component analysis and clustering on manifolds. Journal of Multivariate Analysis, 188, 104862, https://www.sciencedirect.com/science/article/pii/S0047259X21001408
Wiechers, H., Eltzner, B., Mardia, K. V., Huckemann, S. F. (2021). Learning torus PCA based classification for multiscale RNA backbone structure correction with application to SARS-CoV-2. To appear in the Journal of the Royal Statistical Society, Series C, bioRxiv https://doi.org/10.1101/2021.08.06.455406
Strikingly regular, large-scale patterns of vegetation growth were first documented by aerial photography in the Horn of Africa circa 1950 and are now known to exist in drylands across the globe. The patterns often appear on very gently sloped terrain as bands of dense vegetation alternating with bare soil, and models suggest that they may be a strategy for maximizing usage of the limited water available. A particular challenge for modeling these patterns is appropriately resolving fast processes such as surface water flow during rainstorms while still being able to capture slow dynamics such as the uphill migration of the vegetation bands, which has been observed to occur on the scale of a band width per century. We propose a pulsed-precipitation model that treats rainstorms as instantaneous kicks to the soil water as it interacts with vegetation on the timescale of plant growth. We use a stochastic rainfall model with the influence of fast storm-level hydrology captured by the spatial distribution of the soil water kicks. The model allows for predictions about the influence of storm characteristics on the large-scale patterns. Analysis and simulations suggest that the distance water travels on the surface before infiltrating into the soil during a typical storm plays a key role in determining the spacing between the bands.