Wedge-Masked Difference Classification
In Wedge-Masked Difference (WMD) classification the full set of particles are simplified into a new lower-dimensional representation by means of Singular Value decomposition after attempting to take into account the effects of the missing-wedge. Particles projected onto these most variable basis-vectors then can be clustered using a variety of methods.
Within subTOM, wedge-masked differences (the result of the subtraction of the
overall reference, weighted with the particles missing-wedge, and the particle
itself) are first compiled into a 2-D Matrix denoted here as the D-Matrix, which
holds the aligned, band-pass filtered and masked difference data. To speed up
calculation particles can be pre-aligned using the function
subtom_parallel_prealign
. Batches of the D-Matrix are calculated in parallel
with subtom_parallel_dmatrix
and then combined and column-centered with
subtom_join_dmatrix
.
Next the D-Matrix is decomposed by Singular Value decomposition as to skip
calculation of the covariance matrix as described in J. Heumann et al.
in J. Struct. Biol. 2011. This determines a set of
right Singular vectors and Singular values and these are used along with
the D-Matrix to determine the Eigenvolumes of the dataset with
subtom_eigenvolumes_wmd
.
These volumes are then used to determine the low-rank approximation coefficients
in volume space for clustering. A larger particle superset can be projected onto
the volumes to speed up classification of large datasets. Coefficients are also
calculated in parallel in batches with subtom_parallel_coeffs
and
joined with subtom_join_coeffs
.
Finally using a user-selected subset of the determined coefficients, the data is
clustered either by Hierarchical Ascendant Clustering using a Ward distance
criterion, K-Means clustering, or a Gaussian Mixture model with the function
subtom_cluster
. This clustering is then used to generate the final class
averages.