Quicklists
public 01:14:53

Stephan Huckemann : Statistical challenges in shape prediction of biomolecules

  -   Mathematical Biology ( 155 Views )

The three-dimensional higher-order structure of biomolecules determines their functionality. While assessing primary structure is fairly easily accessible, reconstruction of higher order structure is costly. It often requires elaborate correction of atomic clashes, frequently not fully successful. Using RNA data, we describe a purely statistical method, learning error correction, drawing power from a two-scale approach. Our microscopic scale describes single suites by dihedral angles of individual atom bonds; here, addressing the challenge of torus principal component analysis (PCA) leads to a fundamentally new approach to PCA building on principal nested spheres by Jung et al. (2012). Based on an observed relationship with a mesoscopic scale, landmarks describing several suites, we use Fréchet means for angular shape and size-and-shape, correcting within-suite-backbone-to-backbone clashes. We validate this method by comparison to reconstructions obtained from simulations approximating biophysical chemistry and illustrate its power by the RNA example of SARS-CoV-2.

This is joint work with Benjamin Eltzner, Kanti V. Mardia and Henrik Wiechers.

Literature:

Eltzner, B., Huckemann, S. F., Mardia, K. V. (2018): Torus principal component analysis with applications to RNA structure. Ann. Appl. Statist. 12(2), 1332?1359.

Jung, S., Dryden, I. L., Marron, J. S. (2012): Analysis of principal nested spheres. Biometrika, 99 (3), 551-568

Mardia, K. V., Wiechers, H., Eltzner, B., Huckemann, S. F. (2022). Principal component analysis and clustering on manifolds. Journal of Multivariate Analysis, 188, 104862, https://www.sciencedirect.com/science/article/pii/S0047259X21001408

Wiechers, H., Eltzner, B., Mardia, K. V., Huckemann, S. F. (2021). Learning torus PCA based classification for multiscale RNA backbone structure correction with application to SARS-CoV-2. To appear in the Journal of the Royal Statistical Society, Series C, bioRxiv https://doi.org/10.1101/2021.08.06.455406