## Stephan Huckemann : Statistical challenges in shape prediction of biomolecules

- Uploaded by schrett ( 118 Views )The three-dimensional higher-order structure of biomolecules
determines their functionality. While assessing primary structure is
fairly easily accessible, reconstruction of higher order structure is
costly. It often requires elaborate correction of atomic clashes,
frequently not fully successful. Using RNA data, we describe a purely
statistical method, learning error correction, drawing power from a
two-scale approach. Our microscopic scale describes single suites by
dihedral angles of individual atom bonds; here, addressing the
challenge of torus principal component analysis (PCA) leads to a
fundamentally new approach to PCA building on principal nested spheres
by Jung et al. (2012). Based on an observed relationship with a
mesoscopic scale, landmarks describing several suites, we use Fréchet
means for angular shape and size-and-shape, correcting
within-suite-backbone-to-backbone clashes. We validate this method by
comparison to reconstructions obtained from simulations approximating
biophysical chemistry and illustrate its power by the RNA example of
SARS-CoV-2.

This is joint work with Benjamin Eltzner, Kanti V. Mardia and Henrik
Wiechers.

Literature:

Eltzner, B., Huckemann, S. F., Mardia, K. V. (2018):
Torus principal component analysis with applications to RNA
structure. Ann. Appl. Statist. 12(2), 1332?1359.

Jung, S., Dryden, I. L., Marron, J. S. (2012):
Analysis of principal nested spheres. Biometrika, 99 (3), 551-568

Mardia, K. V., Wiechers, H., Eltzner, B., Huckemann, S. F. (2022).
Principal component analysis and clustering on manifolds. Journal of
Multivariate Analysis, 188, 104862,
https://www.sciencedirect.com/science/article/pii/S0047259X21001408

Wiechers, H., Eltzner, B., Mardia, K. V., Huckemann, S. F. (2021).
Learning torus PCA based classification for multiscale RNA backbone
structure correction with application to SARS-CoV-2. To appear in the
Journal of the Royal Statistical Society, Series C,
bioRxiv https://doi.org/10.1101/2021.08.06.455406