subTOM
SubTOM - Subvolume processing scripts with the TOM toolbox is a collection of scripts form a pipeline for subvolume alignment and averaging of electron cryo-tomography data.
Installation
The installation of subTOM is relatively straight-forward. subTOM is currently only built for 64-bit linux computers, with no plans currently to produce builds for Windows or Mac.
git clone /net/dstore2/teraraid/dmorado/software/subTOM
cd subTOM
chmod u+x install.sh
./install.sh <INSTALL_DIR> <MCR_DIR>
Since subTOM is written in Matlab, having a license to use Matlab is preferable, and makes some tasks simpler, while also allowing for doing your own scripting with the TOM toolbox. However, the effort has been made to make sure that the pipeline can run as a whole using only the Matlab Compiler Runtime, which is freely available software, and can be found here:
https://uk.mathworks.com/products/compiler/matlab-runtime.html
subTOM is currently built against the 2021b/v911 MCR so that is the one that you need to have downloaded and have access to.
At the LMB we have the MCR already installed at
/lmb/home/public/matlab/jbriggs
, which you can use for your installation.
subTOM is also distributed like most software nowadays as a Git repository, so if you do not have Git you can find out how to install and use Git here:
Step-by-Step Instructions
From the directory in which you want to install subTOM clone the repository.
git clone /net/dstore2/teraraid/dmorado/software/subTOM
Change into the newly created subTOM directory.
cd subTOM
Make the installation script user-executable, so that you can run it.
chmod u+x install.sh
Run the install script specifying the installation directory and the directory that contains the MCR installation.
./install.sh <INSTALL_DIR> <MCR_DIR>
example
./install.sh /net/dstore2/teraraid/dmorado/software/subTOM /lmb/home/public/matlab/jbriggs
Building
If you have access to the MATLAB compiler you can also build subTOM from the source simply following the steps here, beginning in the subTOM installation directory:
cd src
Edit
subtom_mcc_build.m
and change the top three variables to point correctly at:The subTOM source directory
The root folder of the TOM Toolbox
The path to the MATLAB Toolbox directory * The Statistics and Machine Learning Toolbox is needed for subtom_cluster
Run
subtom_mcc_build.m
in MATLAB and it should correctly compile all neccessary functions and place them in the correct locations.If you run into an issue with a MEX-function in TOM toolbox (such as tom_rotate), then you can recompile such a function with the command
mex -R2018a <filename>
and then recompile subTOM.
Conventions
Preprocessing
Preprocessing is done for dose-fractionated data that comes from detectors that collect movie tilt-images. Preprocessing is done with several programs from various sources.
Beam-induced Motion Correction
The command
alignframes
alignframes man page which is a part of the IMOD package is used to do the beam-induced motion correction of the movies. subTOM assumes that the data is collected with SerialEM but the program should support MRC and TIFF format movie-frames collected with other programs as well. However this requires you to have your movies to have the following name-scheme:<BASENAME>_<FRAME_IDX>_<ANGLE>.<mrc|tif>
Where
<BASENAME>
is your own filename identifier e.g. TS_01Where
<FRAME_IDX>
is a three digit identifier of the movie that describes the order the data was collected in e.g. 000-040Where
<ANGLE>
is the tilt-angle at which the movie was collected at e.g. 15.0
Defocus Estimation
The programs CTFFIND4, GCTF, and IMOD’s
ctfplotter
ctfplotter man page command can all be used to estimate the defocus in the corrected tilt-series
Dose Filtering
The command
alignframes
which is part of the IMOD package is also used to do the filtering of movies based on the accumulated dose of each tilt-image.
CTF Correction
CTF correction is done in 3D using the program novaCTF, and a run script
run_nova.sh
is included to facilitate performing novaCTF in parallel and on
an SGE cluster.
Particle Picking
Particle picking is done using UCSF Chimera. First users use the built-in Volume Tracer utility to create a Marker Set of points at the center of spherical particles onto which seed positions or a collection of Marker Sets of points along tubular surfaces onto which seed positions. The number of Marker Sets used is not important in picking points on spheres, but in picking points on tubes, each tube should correspond to a single Marker Set. The collection of Marker Sets should be saved to a single file, one per tomogram with the name format:
<BASENAME>_<TOMOGRAM_IDX>.cmm
Where
<BASENAME>
is your own filename identifier e.g. clicker.Where
<TOMOGRAM_IDX>
is the tomogram number e.g. 1.
Motive Lists are then generated for the picked objects using the PickParticle plug-in developed in the Briggs’ lab by Kun Qu. The format of motive lists is detailed below, and the motive list is assumed to saved to a single file, one per tomogram with the name format:
<BASENAME>_<TOMOGRAM_IDX>.em
Alignment and Averaging
The alignment parameters for a set of data are stored in a MOTive List or so-called MOTL file, which is a table of 20-fields stored in an EM-format binary data file. Particles are also extracted into subvolumes in EM-format from tomograms which are expected to be in MRC-format with the name:
<TOMOGRAM_IDX>.rec
Orientations
Coordinate System
subTOM uses a right-handed coordinate system where positive rotations are clockwise looking along the directed axis. The orthogonal axes X, Y, Z are with the positive Z-axis pointing out of the screen out at the user.
Image Center
Since Matlab uses array-indices that start from 1, unlike most other programming languages which count from zero, the origin of a subvolume with dimensions, \(NX, NY, NZ\) is defined as:
Euler Angle Rotations
Rotations in MOTLs describe the best-found rotation of the reference to the particle in terms of ZXZ Euler angles in degrees. The Euler angle definition in subTOM is:
The first rotation Azimuth or psi (\(\psi\)) about the Z-axis.
The second rotation Zenith or theta (\(\theta\)) about the new X-axis.
The final rotaiton Spin or phi (\(\phi\)) about the final Z-axis.
This is particularly confusing given that phi and psi generally are swapped in other software packages, but is kept for historical reasons from the TOM-toolbox. Therefore care has been taken to use the unambiguous notation azimuth, zenith, and spin in most of the subTOM code and documentation.
Translations
Translations in MOTLs describe the best-found translation of the reference to the particle in pixels with respect to the subvolume origin. This translation occurs after the rotation of the reference about the subvolume origin.
Motive List Specification
Field |
Contents |
---|---|
1 |
Cross-Correlation Coefficient |
2 |
Marker Set Used from PickParticle |
3 |
Radius of tube/sphere in PickParticle |
4 |
Particle Number (Running count from 1) |
5 |
Tomogram Number (Running count from 1) |
6 |
PickParticle Object Number (Running) |
7 |
Tomogram Number (From Filename) |
8 |
X-coordinate in Tomogram (Integer) |
9 |
Y-coordinate in Tomogram (Integer) |
10 |
Z-coordinate in Tomogram (Integer) |
11 |
X-translation AFTER rotation of Ref. |
12 |
Y-translation AFTER rotation of Ref. |
13 |
Z-translation AFTER rotation of Ref. |
14 |
Not Used (X-shift BEFORE rotation) |
15 |
Not Used (Y-shift BEFORE rotation) |
16 |
Not Used (Z-shift BEFORE rotation) |
17 |
Spin Rotation of Ref. in Degrees |
18 |
Azimuth Rotation of Ref. in Degrees |
19 |
Zenith Rotation of Ref. in Degrees |
20 |
Class Number |
Class Number
The class number field acts as a field for classification, but also thresholding. Historically:
Particles that have class number 1 are always aligned and included in the final average.
Particles that have class number 2 are always aligned but are not included in the final average.
Particles that have class number \(\leq 0\) are not aligned nor included in the final average.
Remaining class numbers \(\gt 2\) can be used in classification to identify homogeneous subsets within a heterogeneous dataset.
Example Workflow
The best way to learn how to use subTOM is to just start using it. I would suggest downloading the HIV-1 CA-SP1 publicly available dataset EMPIAR-10164 and using the same subset that was used in Turonova et al. An example workflow showing which scripts are used when and where is included here as a sort of guide into what a particular project run of subTOM looks like from start to finish. Scripts are copied from your subTOM installation directory to your project directory and edited from the project directory. Then you make the script executable and run it, again from the project directory.
You can download a PDF version of the workflow: here
.
Scripts
Scripts are the main point of user-interaction with the processing pipeline. As such care has been taken to make sure that the scripts are well commented and that the user-defined options are well separated from the actual black-box level processing further down in the code.
Each script is a simple BASH file that generally calls some piece of code from either IMOD or a subTOM Matlab function. The scripts are meant to be first edited, filling in the necessary information and then executed in the terminal. Matlab is not necessary to run the scripts, just the Matlab Compiler Runtime, which should have been taken care of in the installation.
subtom_alignment
The main pipeline process script of subTOM. Iteratively aligns and averages a collection of subvolumes.
This subtomogram alignment script uses five MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- align_exec
Alignment executable
- cat_exec
Concatenate MOTLs executable
- sum_exec
Parallel Summing executable
- avg_exec
Weighted Averaging executable
- compare_exec
Compare MOTLs executable
Memory Options
- mem_free_ali
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max_ali
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
- mem_free_avg
The amount of memory the job requires for averaging.
- mem_max_avg
The upper bound on the amount of memory the averaging job is allowed to use.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Subtomogram Alignment Workflow Options
Parallelization Options
- start_iteration
The index of the reference to start from : input will be ref_fn_prefix_start_iteration.em and all_motl_fn_prefix_start_iteration.em (define as integer e.g. start_iteration=3)
More on iterations since they’re confusing and it is slightly different here than from previous iterations.
The start_iteration is the beginning for the iteration variable used throughout this script. Iteration refers to iteration that is used for subtomogram alignment. So if start_iteration is 1, then subtomogram alignment will work using allmotl_1.em and ref_1.em. The output from alignment will be particle motls for the next iteration. This in the script is avg_iteration variable. The particle motls will be joined to form allmotl_2.em and then the parallel averaging will form ref_2.em and then the loop is done and iteration will become 2 and avg_iteration will become 3.
- iterations
Number iterations (big loop) to run: final output will be ref_fn_prefix_start_iteration+iterations.em and all_motl_fn_prefix_start_iteration+iterations.em
- num_ali_batch
The number of batches to split the parallel subtomogram alignment job into.
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
File Options
- all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. all_motl_fn_prefix=’sub-directory/allmotl’)
- ref_fn_prefix
Relative path and name of the reference volumes (e.g. ref_iter.em , the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’)
- align_mask_fn
Relative path and name of the alignment mask Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- cc_mask_fn
Relative path and name of the cross-correlation mask this defines the maximum shifts in each direction Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- weight_fn_prefix
Relative path and name of the weight file.
- weight_sum_fn_prefix
Relative path and name of the partial weight files.
Alignment and Averaging Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- apply_weight
Apply weight to subtomograms (1=yes, 0=no).
- apply_mask
Apply mask to subtomograms (1=yes, 0=no).
- psi_angle_step
Angular increment in degrees, applied during the cone-search, i.e. psi and theta (define as real e.g. psi_angle_step=3). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- psi_angle_shells
Number of angular iterations, applied to psi and theta (define as integer e.g. psi_angle_shells=4). Note that in terms of cones this is twice the number of cones sampled. Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- phi_angle_step
Angular increment for phi in degrees, (define as real e.g. phi_angle_step=3). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- phi_angle_shells
Number of angular iterations for phi, (define as integer e.g. phi_angle_shells=6). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above. Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- low_pass_fp
Low pass filter (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above. Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- nfold
Symmetry, if no symmetry nfold=1 (define as integer e.g. nfold=3). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- threshold
Threshold for cross correlation coefficient. Only particles with ccc_new > threshold will be added to new average (define as real e.g. threshold=0.5). These particles will still be aligned at each iteration.
- iclass
Particles with that number in position 20 of motivelist will be added to new average (define as integer e.g. iclass=1). NOTES: Class 1 is ALWAYS added. Negative classes and class 2 are never added.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
align_exec="${exec_dir}/alignment/subtom_scan_angles_exact"
cat_exec="${exec_dir}/MOTL/subtom_cat_motls"
sum_exec="${exec_dir}/alignment/subtom_parallel_sums"
avg_exec="${exec_dir}/alignment/subtom_weighted_average"
compare_exec="${exec_dir}/MOTL/subtom_compare_motls"
mem_free_ali=1G
mem_max_ali=64G
mem_free_avg=1G
mem_max_avg=64G
job_name=subTOM
array_max=1000
max_jobs=4000
run_local=0
skip_local_copy=1
start_iteration=1
iterations=3
num_ali_batch=1
num_avg_batch=1
all_motl_fn_prefix="combinedmotl/allmotl"
ref_fn_prefix="ref/ref"
ptcl_fn_prefix="subtomograms/subtomo"
align_mask_fn=("otherinputs/align_mask_1.em" \
"otherinputs/align_mask_2.em" \
"otherinputs/align_mask_3.em")
cc_mask_fn=("otherinputs/cc_mask_r10.em" \
"otherinputs/cc_mask_r05.em")
weight_fn_prefix="otherinputs/ampspec"
weight_sum_fn_prefix="otherinputs/wei"
tomo_row=7
apply_weight=0
apply_mask=1
psi_angle_step=(10 5 2.5)
psi_angle_shells=(4)
phi_angle_step=(20 5)
phi_angle_shells=(6)
high_pass_fp=(1)
high_pass_sigma=(2)
low_pass_fp=(12 15 18)
low_pass_sigma=(3)
nfold=(1 6)
threshold=-1
iclass=0
subtom_average
Calculates the average from a given MOTL file in parallel on the cluster or locally.
This subtomogram averaging script uses five MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- sum_exec
Parallel Summing executable
- avg_exec
Weighted Averaging executable
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Subtomogram Averaging Workflow Options
Parallelization Options
- iteration
The index of the reference to generate : input will be all_motl_fn_prefix_iteration.em (define as integer)
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
File Options
- all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. all_motl_fn_prefix=’sub-directory/allmotl’)
- ref_fn_prefix
Relative path and name of the reference volumes (e.g. ref_iter.em , the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’)
- weight_fn_prefix
Relative path and name of the weight file.
- weight_sum_fn_prefix
Relative path and name of the partial weight files.
Averaging Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- iclass
Particles with that number in position 20 of motivelist will be added to new average (define as integer e.g. iclass=1). NOTES: Class 1 is ALWAYS added. Negative classes and class 2 are never added.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
sum_exec="${exec_dir}/alignment/subtom_parallel_sums"
avg_exec="${exec_dir}/alignment/subtom_weighted_average"
mem_free=1G
mem_max=64G
job_name=subTOM
array_max=1000
max_jobs=4000
run_local=0
skip_local_copy=1
iteration=1
num_avg_batch=1
all_motl_fn_prefix="combinedmotl/allmotl"
ref_fn_prefix="ref/ref"
ptcl_fn_prefix="subtomograms/subtomo"
weight_fn_prefix="otherinputs/ampspec"
weight_sum_fn_prefix="otherinputs/wei"
tomo_row=7
iclass=0
subtom_bandpass
Creates and/or applies a bandpass filter to a volume.
This utility script uses one MATLAB compiled script below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- bandpass_exec
Bandpass executable
File Options
- input_motl_fn
Relative path and name of the input volume to build and filter the bandpass against. If you just want to visualize an arbitrary filter you can use subtom_shape to create a template of the correct size and not ask for the filtered output.
- filter_fn
Relative path and name of the Fourier bandpass filter to write. If you do not want to output the filter volume simply leave this option blank.
- output_fn
Relative path and name of the filtered volume to write. If you do not want to output the filtered volume simply leave this option blank.
Filter Options
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as (box_size*pixelsize)/(resolution_real) (define as integer e.g. high_pass_fp=2)
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above.
- low_pass_fp
Low pass filter (in transform units (pixels): calculate as (box_size*pixelsize)/(resolution_real) (define as integer e.g. low_pass_fp=7).
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
bandpass_exec="${exec_dir}/MOTL/subtom_unclass_motl"
input_fn="ref/ref_1.em"
filter_fn="otherinputs/bandpass_hp2s2_lp15s3.em"
output_fn="ref/ref_hp2s2_lp15s3_1.em"
high_pass_fp=2
high_pass_sigma=2
low_pass_fp=15
low_pass_sigma=3
subtom_cat_motls
Concatenate motive lists and print on the standard output.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- cat_exec
Concatenate MOTLs executable
File Options
- input_motl_fns
Relative path and filename(s) of the input MOTL files to be concatenated. You can use shell wildcard characters * and ? to specify a given number of files and they will be expanded or you can just list the files one by one.
- output_motl_fn
Relative path and name of the output MOTL file. If you are not going to write an output file just set this variable to ‘’
- output_star_fn
Relative path and name of the output STAR file. If you are not going to write an output file just set this variable to ‘’
Concatenate Options
- write_motl
If you want to write out the concatenated MOTL files set this to 1, however if you just want to print the MOTL contents to the screen, set this to 0.
- write_star
If you want to write out the concatenated STAR file set this to 1, however if you just want to print the MOTL contents to the screen, set this to 0.
- sort_row
If you want to have the output MOTL file sorted by a particular field then specify it here. If the given value is not a value between 1-20 then the output MOTL file will be sorted arbitrarily based on the dir command in Matlab.
- do_quiet
If you just want to write output to files and not print to the screen set this to 1, however if you want to see the output printed to the screen leave this set to 0.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
cat_exec="${exec_dir}/MOTL/subtom_cat_motls"
input_motl_fns=("combinedmotl/allmotl_1_tomo_1.em" \
"combinedmotl/allmotl_1_tomo_2.em" \
"combinedmotl/allmotl_1_tomo_3.em" \
"combinedmotl/allmotl_1_tomo_4.em" \
"combinedmotl/allmotl_1_tomo_5.em")
output_motl_fn="combinedmotl/allmotl_1.em"
output_star_fn="combinemotl/allmotl_1.star"
write_motl=1
write_star=0
sort_row=4
do_quiet=1
subtom_clean_motl
Cleans a given MOTL file based on distance and/or CC scores.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- clean_motl_exec
Clean MOTLs executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be cleaned.
- output_motl_fn
Relative path and name of the output MOTL file.
- output_stats_fn
Relative path and name of the optional output cleaning stats CSV file. If you do not want to write out the differences just leave this as “”.
The CSV format of the output statistics file for cleaning is a single row with the following columns:
Column |
Value |
---|---|
1 |
Total Initial Number of Particles |
2 |
Number of Particles Removed in Edge Cleaning |
3 |
Number of Particles Removed in Cluster Cleaning |
4 |
Number of Particles Removed in Distance Cleaning |
5 |
Number of Particles Removed in CC Cleaning |
6 |
Total Remaining Number of Particles |
Clean Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- do_ccclean
If the following is set to 1 then the MOTL will be cleaned by CCC either by CCC value or a fraction of the highest CCC values to keep. If it is set to 0 then CCC cleaning will be skipped.
- cc_fraction
If cleaning by CC then after edge, cluster, and distance cleaning, if any are selected, is completed. The MOTL will be sorted by CCC and then the top fraction as specified here will be kept with the rest discarded. For example if cc_fraction=0.7 the top 70% of the clean data will be kept and the bottom 30% of the cleaned data will be discarded. A value of 1 here means that data is not cleaned by CCC fraction.
- cc_cutoff
If cc_fraction is 1 and therefore not used then particles with a CCC below this cutoff will be removed from the output MOTL file. Values must be within -1 to 1, with -1 not removing any particles.
- do_distance
If the following is set to 1 then the MOTL will be cleaned by distance. If it is set to 0 distance cleaning will be skipped.
- distance_cutoff
Particles that are less than this distance in pixels from another particle will be cleaned with the particle with the highest CCC kept while the others are removed from the output MOTL file.
- do_cluster
If the following is set to 1 then the MOTL will be cleaned by a clustering criteria that enforces kept particles to exist as clusters. This can be useful when there is no lattice and clusters of particles makes a good indication that a true copy of the reference exists there. If it is set to 0 cluster cleaning will be skipped.
- cluster_distance
The following determines the radius that defines what is considered a cluster in cluster cleaning.
- cluster_size
The cluster size specifies how many particles must be found within cluster_distance for the particle to be considered part of a cluster. The particle with the highest CCC in the cluster will be selected as the representative particle for the cluster and the remaining clustered points will be removed.
- do_edge
If the following is set to 1 then the MOTL will be edge cleaned considering the dimensions of the tomogram in which the particles are contained. If any part of the particle exists outside of the tomogram it will be removed from the MOTL. If it is set to 0 edge cleaning will be skipped.
- tomogram_dir
Absolute path to the folder where the tomograms are stored. If you are not edge cleaning leave this set to “”.
- box_size
What is the box size of the particle that will be extracted from the tomogram, which is necessary to specify to be able to edge clean.
- write_stats
If the following is 1 then the details of how many particles were cleaned in each stage will be written out, if 0 then not.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
clean_motl_exec="${exec_dir}/MOTL/subtom_clean_motl"
input_motl_fn="combinedmotl/allmotl_2.em"
output_motl_fn="combinedmotl/allmotl_cc0.1_dist4_cluster2d10_2.em"
output_stats_fn="combinedmotl/allmotl_cc0.1_dist4_cluster2d10_stats.csv"
tomo_row=7
do_ccclean=1
cc_fraction=1
cc_cutoff=0.1
do_distance=1
distance_cutoff=4
do_cluster=1
cluster_distance=10
cluster_size=2
do_edge=1
tomogram_dir="/net/dstore2/teraraid/dmorado/subTOM_tutorial/data/tomos/bin8"
box_size=36
write_stats=1
subtom_compare_motls
Compares the translations and rotations between two MOTLS of different iterations of alignment.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- compare_motls_exec
Comparison MOTLs executable
File Options
- motl_1_fn
Relative path and name of the first MOTL file
- motl_2_fn
Relative path and name of the second MOTL file
- output_diffs_fn
Relative path and name of the optional output difference CSV file. If you do not want to write out the differences just leave this as “”.
The CSV format of the output differences file for comparison is one row per particle in the motive list that has six columns, and a special final line with 22 columns for statistics of differences for the whole motive list. The particle columns are as follows:
Column |
Value |
---|---|
1 |
Particle Index (Motive List row 4) |
2 |
CCC Score for particle in first motive list |
3 |
CCC Score for particle in second motive list |
4 |
Coordinate displacement between motive lists |
5 |
Angular displacement between motive lists |
6 |
Angular displacement ignoring inplane rotations |
The special final line columns are as follows:
Column |
Value |
---|---|
1 |
Mean Coordinate displacement between MOTLs |
2 |
Median Coordinate displacement between MOTLs |
3 |
Coordinate displacement standard deviation |
4 |
Maximum Coordinate displacement between MOTLs |
5 |
Mean Angular displacement between MOTLs |
6 |
Median Angular displacement between MOTLs |
7 |
Angular displacement standard deviation |
8 |
Maximum Angular displacement between MOTLs |
9 |
Same as 5 but ignoring inplane rotations |
10 |
Same as 6 but ignoring inplane rotations |
11 |
Same as 7 but ignoring inplane rotations |
12 |
Same as 8 but ignoring inplane rotations |
13 |
Mean CCC score in the first motive list |
14 |
Median CCC score in the first motive list |
15 |
CCC standard deviation in first motive list |
16 |
Minimum CCC score in the first motive list |
17 |
Maximum CCC score in the first motive list |
18 |
Mean CCC score in the second motive list |
19 |
Median CCC score in the second motive list |
20 |
CCC standard deviation in second motive list |
21 |
Minimum CCC score in the second motive list |
22 |
Maximum CCC score in the second motive list |
Comparison Options
- write_diffs
If the following is 1 then the differences will be written out, if 0 then not.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
compare_motls_exec="${exec_dir}/MOTL/subtom_compare_motls"
motl_1_fn="combinedmotl/allmotl_1.em"
motl_2_fn="combinedmotl/allmotl_2.em"
output_diffs_fn="combinedmotl/allmotl_1_2_diffs.csv"
write_diffs=1
subtom_even_odd_motl
Splits a given MOTL file into even/odd halves for gold-standard refinement.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- even_odd_motl_exec
Even-Odd split motive list executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be split.
- output_motl_fn
Relative path and name of the output MOTL file where the even and odd halves are specified by the class number in the 20th row of the motive list. The even half inherits the current class number plus 200 and the odd half inherits the current class numbers plus 100.
- even_motl_fn
Relative path and name of the output even MOTL file.
- odd_motl_fn
Relative path and name of the output odd MOTL file.
Even / Odd Options
- split_row
The following specifies which row of the MOTL will be used to split the data. To simply split into even and odd halves use the particle running ID, which is row 4. To split the halves by tomogram use row 5 or 7, and to split the halves by tube or sphere use row 6.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
even_odd_exec="${exec_dir}/MOTL/subtom_even_odd_motl"
input_motl_fn="combinedmotl/allmotl_1.em"
output_motl_fn="combinedmotl/allmotl_eo_1.em"
even_motl_fn="even/combinedmotl/allmotl_1.em"
odd_motl_fn="odd/combinedmotl/allmotl_1.em"
split_row=4
subtom_extract_noise
This script finds and extracts noise particles from tomograms and generates amplitude spectrum volumes for used in Fourier reweighting of particles in the subtomogram alignment and averaging routines, as a Fourier weight in place of a traditional binary-wedge. Also generates an estimated binary wedge as well from the noise.
It also generates a noise motl file so that the noise positions found in binned tomograms can then be used later on in less or unbinned tomograms and after some positions have been cleaned, which could make it more difficult to pick non-structural noise in the tomogram.
This tomogram extraction script uses one MATLAB compiled scripts below:
Options
Directories
- tomogram_dir
Absolute path to the folder where the tomograms are stored
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- noise_extract_exe
Noise extraction executable
- motl_dump_exe
MOTL dump executable
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
Noise Extraction Workflow Options
File Options
- iteration
The iteration of the all particle motive list to extract from : input will be all_motl_fn_prefix_iteration.em (define as integer)
- all_motl_fn_prefix
Relative path to allmotl file from root folder.
- noise_motl_fn_prefix
Relative path to noisemotl filename. If the file doesn’t exist a new one will be written with the determined noise positions. If a previously existing noise motl exists it will be used instead. If the number of noise particles requested has been increased new particles will be found and added and the file will be updated.
- ampspec_fn_prefix
Relative path and filename prefix for output amplitude spectrums
- binary_fn_prefix
Relative path and filename prefix for output binary wedges
Tomogram Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
Extraction Options
- box_size
Size of subtomogram in pixels
- just_extract
If you already have noise MOTL lists calculated which may contain less than the total number of requested noise, but just want the code to do the extraction then you can set just_extract to 1. Otherwise set it to 0.
- ptcl_overlap_factor
The amount of overlap to allow between noise particles and subtomograms Numbers less than 0 will allow for larger than a box size spacing between noise and a particle. Numbers greater than 0 will allow for some overlap between noise and a particle. For example 0.5 will allow 50% overlap between the noise and the particle, which can be useful when the box size is much larger than the particle.
- noise_overlap_factor
The amount of overlap to allow between noise particles Numbers less than 0 will allow for larger than a box size spacing between noise. Numbers greater than 0 will allow for some overlap between noise. For example 0.75 will allow 75% overlap between the noise, which can be useful when there is not much space for enough noise.
- num_noise
Number of noise particles to extract.
- reextract
Set reextract to 1 if you want to force the program to re-extract amplitude spectra even if the amplitude spectrum file already exists.
- preload_tomogram
Set preload_tomogram to 1 if you want to read the whole tomogram into memory before extraction. This is the fastest way to extract particles however the system needs to be able to have the memory to fit the whole tomogram into memory or otherwise it will crash. If it is set to 0, then either the subtomograms can be extracted using a memory-map to the data, or read directly from the file.
- use_tom_red
Set use_tom_red to 1 if you want to use the AV3/TOM function tom_red to extract particles. This requires that preload_tomogram above is set to 1. This is the original way to extract particles, but it seemed to sometimes produce subtomograms that were incorrectly sized. If it is set to 0 then an inlined window function is used instead.
- use_memmap
Set use_memmap to 1 to memory-map the tomogram and read subtomograms from this map. This appears to be a little slower than having the tomogram fully in memory without the massive memory footprint. However, it also appears to be slightly unstable and may crash unexpectedly. If it is set to 0 and preload_tomogram is also 0, then subtomograms will be read directly from the tomogram on disk. This also requires much less memory, however it appears to be extremely slow, so this only makes sense for a large number of tomograms being extracted on the cluster.
Example
tomogram_dir="/net/dstore2/teraraid/dmorado/subTOM_tutorial/data/tomos/bin8"
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
noise_extract_exe="${exec_dir}/alignment/subtom_extract_noise"
motl_dump_exe="${exec_dir}/MOTL/motl_dump"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
run_local=0
iteration=1
all_motl_fn_prefix="combinedmotl/allmotl"
noise_motl_fn_prefix="combinedmotl/noisemotl"
ampspec_fn_prefix="otherinputs/ampspec"
binary_fn_prefix="otherinputs/binary"
tomo_row=7
box_size=128
just_extract=0
ptcl_overlap_factor=0
noise_overlap_factor=0.75
num_noise=1000
reextract=0
preload_tomogram=1
use_tom_red=0
use_memmap=0
subtom_extract_subtomograms
This script takes an input number of cores, and on each core extract one tomogram at a time as written in a specified row of the all motive list. Parallelization works by writing a start file upon openinig of a tomo, and a completion file. After tomogram extraction, it moves on to the next tomogram that hasn’t been started.
This tomogram extraction script uses one MATLAB compiled scripts below:
Options
Directories
- tomogram_dir
Absolute path to the folder where the tomograms are stored
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- extract_exe
Subtomogram extraction executable
- motl_dump_exe
MOTL dump executable
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
Subtomogram Extraction Workflow Options
File Options
- iteration
The iteration of the all particle motive list to extract from : input will be all_motl_fn_prefix_iteration.em (define as integer)
- all_motl_fn_prefix
Relative path to allmotl file from root folder.
- subtomo_fn_prefix
Relative path and filename for output subtomograms.
- stats_fn_prefix
Relative path and filename for stats .csv files.
The CSV format of the subtomogram stats is a single file for each tomogram with one line per particle in the tomogram with six columns. The particle columns are as follows:
Column |
Value |
---|---|
1 |
Particle Index (Motive List row 4) |
2 |
Mean value for the subtomogram |
3 |
Maximum value in the subtomogram |
4 |
Minimum value in the subtomogram |
5 |
Standard deviation of values in the subtomogram |
6 |
Variance of values in the subtomogram |
Tomogram Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
Extraction Options
- box_size
Size of subtomogram in pixels
- subtomo_digits
Leading zeros for subtomograms, for AV3, use 1. Other numbers are useful for DYNAMO.
- reextract
Set reextract to 1 if you want to force the program to re-extract subtomograms even if the stats file and the subtomograms already exist. If the stats file for the tomogram exists and is the correct size the whole tomogram will be skipped. If the subtomogram exists it will also be skipped, unless this option is true.
- preload_tomogram
Set preload_tomogram to 1 if you want to read the whole tomogram into memory before extraction. This is the fastest way to extract particles however the system needs to be able to have the memory to fit the whole tomogram into memory or otherwise it will crash. If it is set to 0, then either the subtomograms can be extracted using a memory-map to the data, or read directly from the file.
- use_tom_red
Set use_tom_red to 1 if you want to use the AV3/TOM function tom_red to extract particles. This requires that preload_tomogram above is set to 1. This is the original way to extract particles, but it seemed to sometimes produce subtomograms that were incorrectly sized. If it is set to 0 then an inlined window function is used instead.
- use_memmap
Set use_memmap to 1 to memory-map the tomogram and read subtomograms from this map. This appears to be a little slower than having the tomogram fully in memory without the massive memory footprint. However, it also appears to be slightly unstable and may crash unexpectedly. If it is set to 0 and preload_tomogram is also 0, then subtomograms will be read directly from the tomogram on disk. This also requires much less memory, however it appears to be extremely slow, so this only makes sense for a large number of tomograms being extracted on the cluster.
Example
tomogram_dir="/net/dstore2/teraraid/dmorado/subTOM_tutorial/data/tomos/bin8"
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
extract_exe="${exec_dir}/alignment/subtom_extract_subtomograms"
motl_dump_exe="${exec_dir}/MOTL/motl_dump"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
run_local=0
iteration=1
all_motl_fn_prefix="combinedmotl/allmotl"
subtomo_fn_prefix="subtomograms/subtomo"
stats_fn_prefix="subtomograms/stats/tomo"
tomo_row=7
box_size=128
subtomo_digits=1
reextract=0
preload_tomogram=1
use_tom_red=0
use_memmap=0
subtom_maskcorrected_fsc
Calculates a “mask-corrected” Fourier Shell Correlation between two volumes and generates a final average as well as optionally ad-hoc B-factor sharpened maps.
This script is meant to run on a local workstation with access to an X server in the case when the user wants to display figures. I am unsure if both plotting options are disabled if the graphics display is still required, but if not it could be run remotely on the cluster, but it shouldn’t be necessary.
This EM-map analysis script uses just one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables.
Variables
- fsc_exec
Mask-corrected FSC executable.
File Options
- ref_a_fn_prefix
Relative path and filename prefix of the first half-map.
- ref_b_fn_prefix
Relative path and filename prefix of the second half-map.
- iteration
The index of the reference to generate : input will be ref_{a,b}_fn_prefix_iteration.em (define as integer).
- fsc_mask_fn
Relative path and name of the FSC mask.
- filter_a_fn
Relative path and name of the Fourier filter volume for the first half-map. If not using the option do_reweight just leave this set to “”
- filter_b_fn
Relative path and name of the Fourier filter volume for the second half-map. If not using the option do_reweight just leave this set to “”
- output_fn_prefix
Relative path and prefix for the name of the output maps and figures.
FSC Options
- pixelsize
Pixelsize of the half-maps in Angstroms.
- nfold
Symmetry to applied the half-maps before calculating FSC (1 is no symmetry).
- rand_threshold
The Fourier pixel at which phase-randomization begins is set automatically to the point where the unmasked FSC falls below this threshold.
- plot_fsc
Plot the FSC curves - 1 = yes, 0 = no
Reweighting Options
- do_reweight
Set to 1 to apply the externally calculated Fourier weights filter_A_fn and filter_B_fn to each half-map to reweight the final output map.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
fsc_exec="${exec_dir}/analysis/subtom_maskcorrected_fsc"
ref_a_fn_prefix="even/ref/ref"
ref_b_fn_prefix="odd/ref/ref"
iteration=1
fsc_mask_fn="FSC/fsc_mask.em"
filter_a_fn=""
filter_b_fn=""
output_fn_prefix="FSC/ref"
pixelsize=1
nfold=1
rand_threshold=0.8
plot_fsc=1
do_sharpen=1
b_factor=-150
box_gaussian=3
filter_mode=1
filter_threshold=0.143
plot_sharpen=1
do_reweight=0
subtom_preprocess
Aligns dose-fractionated data, sorts and stacks aligned frames, determines the defocus of the tilt-series using CTFFIND4, GCTF, or IMOD CTFPLOTTER and then dose-filters the tilt-series in prepartion for alignment using IMOD/eTomo.
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- frame_dir
Relative path to the folder where the dose-fractionated movie frames are located.
Executables
- alignframes_exe
Absolute path to the IMOD alignframes executable. The directory of this will be used for the other IMOD programs used in the processing. Need version at least above 4.10.29
- ctffind_exe
Absolute path to the CTFFIND4 executable. Needs version at least above 4.1.13.
- gctf_exe
Absolute path to the GCTF executable. I wouldn’t use it because it rarely works but it seems a version of 1.06 sometimes doesn’t crash.
- exec_dir
Directory for subTOM executables
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
File Options
- ts_fmt
The format string for the datasets to process. The string XXXIDXXXX will be replaced with the numbers specified between the range start_idx and end_idx.
The raw sum tilt-series will have the name format ts_fmt.st, or ts_fmt.mrc, an extended data file ts_fmt.{mrc,st}.mdoc and could possibly have an associated log ts_fmt.log.
Dose-fractionated movies of tilt-images are assumed to have the name format: ts_fmt_###_*.{mrc,tif} where ### is a running three-digit ID number for the tilt-image and * is the tilt-angle.
- start_idx
The first tilt-series to operate on.
- end_idx
The last tilt-series to operate on.
- idx_fmt
The format string for the tomogram indexes. Likely two or three digit zero padding or maybe just flat integers.
Beam Induced Motion Correction Options
- do_aligned
If you want to run alignframes to generate the non-dose-weighted tiltseries set this option to 1 and if you want to skip this step set this option to 0.
- do_doseweight
If you want to run alignframes to generate the dose-weighted tiltseries set this option to 1 and if you want to skip this step set this option to 0.
- do_gain_correction
Determines whether or not gain-correction needs to be done on the frames. Set to 1 to apply gain-correction during motion-correction, and 0 to skip it. Normally TIFF format frames will be saved with compression and will be unnormalized, and should be gain-corrected. MRC format frames are generally already saved with gain-correction applied during collection, so it can be skipped here.
A good rule of thumb, is if you have a dm4 file in your data you need to do gain-correction, and if you don’t see a dm4 file you do not.
- gainref_fn
The path to the gain-reference file, this will only be used if gain_correction is going to be applied.
- defects_fn
The path to the defects file, this is saved along with the gain-reference for unnormalized saved frames by SerialEM, and will only be used if gain-correction is going to be applied.
- align_bin
Binning to apply to the frames when calculating the alignment, if you are using super-resolution you may want to change this to 2. The defaults from IMOD would be 3 for counted data and 6 for super-resolution data. Multiple binnings can be tested and the best one will be used to generate the final sum.
- sum_bin
Binning to apply to the final sum. This is done using Fourier cropping as in MotionCorr and other similar programs. If you are using super-resolution you probably want to change this to 2, otherwise it should be set to 1.
- scale
Amount of scaling to apply to summed values before output. The default is 30 however serialEM applies one of 39.3?
- filter_radius2
Cutoff Frequency for the lowpass filter used in frame alignment. The unit is absolute spatial frequency which goes from 0 to 0.5 relative to the pixelsize of the input frames (not considering binning applied in alignment). The default from IMOD is 0.06. Multiple radii can be used and the best filter will be selected for the actually used alignment.
- filter_sigma2
Falloff for the lowpass filter used in frame alignment. Same units as above. The defaults from IMOD is 0.0086.
- shift_limit
Limit on distance to search for correlation peak in unbinned pixels. The default from IMOD is 20.
- do_refinement
If this is set to 1, alignframes will do an iterative refinement of the initially found frame alignment solution. The default in IMOD is to not do this refinement.
- refine_iterations
The maximum number of refinement iterations to run.
- refine_radius2
Cutoff Frequency for the lowpass filter used in refinement. The default in IMOD would be to use the same value used in alignment.
- refine_shift_stop
The amount of shift at which refinement will stop in unbinned pixels.
- truncate_above
Movies often contain hot pixels not removed from the pixel-defect mask either from x-rays or other factors and these throw off the later scaling of sums. Traditionally they would be removed in eTomo using the ccderaser command / step, but it has been found to go better to truncate them at the frame-alignment and summing step. To find a reasonable value to truncate above use the command ‘clip stats’ on several movies to find out where the values start to become outliers, it should be around 5-7 for 10 frame movies of about 3e/A^2 on the K2.
- use_gpu
If you want to use a GPU set this to 1, but be careful to not use both the cluster and the GPU as this is not supported.
- extra_opts
If you want to use other options to alignframes specify them here.
CTF Estimation Options
- apix
The pixel size of the raw movie frames if they exist, or the pixelsize of the “_aligned.st” stack if alignframes and dose-weighting is not being done. The actual pixelsize used in CTF estimation is apix * sum_bin.
- do_ctffind4
If this is set to 1, the defocus will be estimated with CTFFIND4.
- do_gctf
If this is set to 1, the defocus will be estimated with GCTF.
- do_ctfplotter
If this is set to 1, the defocus will be estimated with CTFPLOTTER.
- voltage_kev
The accelerating voltage of the microscope in KeV.
- cs
The spherical aberration of the microscope in mm.
- ac
The amount of amplitude contrast in the imaging system.
- tile_size
The size of tile to operate on.
- min_res
The lowest wavelength in Angstroms to allow in fitting (minimum resolution).
- max_res
The highest wavelength in Angstroms to allow in fitting (maximum resolution).
- min_res_ctfplotter
The lowest wavelength in Angstroms to allow in fitting in CTFPLOTTER.
- max_res_ctfplotter
The highest wavelength in Angstroms to allow in fitting in CTFPLOTTER.
- min_def
The lowest defocus in Angstroms to scan.
- max_def
The highest defocus in Angstroms to scan.
- def_step
The step size in Angstroms to scan defocus.
- astigmatism
The amount of astigmatism to allow in Angstroms.
- tilt_axis_angle
The tilt-axis angle of the tilt series. This is only needed if you are estimating the CTF with ctfplotter. You can find this value running the command ‘header’ on the raw sum tiltseries and looking at the first label (Titles) in the header.
Dose Filtering Options
- dose_per_tilt
The dose per micrograph in Electrons per square Angstrom.
Example
scratch_dir="${PWD}"
frame_dir="Frames"
alignframes_exe="$(which alignframes)"
ctffind_exe="$(which ctffind)"
gctf_exe="$(which Gctf)"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
run_local=1
ts_fmt="TS_XXXIDXXXX"
start_idx=1
end_idx=1
idx_fmt="%02d"
do_aligned=1
do_doseweight=1
do_gain_correction=1
gainref_fn="Frames/gainref.dm4"
defects_fn="Frames/defects.txt"
align_bin=1,2,3
sum_bin=1
scale=39.3
filter_radius2=0.167,0.125,0.10,0.06
filter_sigma2=0.0086
shift_limit=20
do_refinement=1
refine_iterations=5
refine_radius2=0.167
refine_shift_stop=0.1
truncate_above=7
use_gpu=0
extra_opts=''
apix=1
do_ctffind4=1
do_gctf=0
do_ctfplotter=1
voltage_kev=300.0
cs=2.7
ac=0.07
tile_size=512
min_res=30.0
max_res=5.0
min_res_ctfplotter=50.0
max_res_ctfplotter=10.0
min_def=10000.0
max_def=60000.0
def_step=100.0
astigmatism=1000.0
tilt_axis_angle=85.3
dose_per_tilt=3.5
subtom_random_subset_motl
Draws a random subset from a given MOTL file.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- random_subset_motl_exec
Random subset motive list executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to draw the subset from.
- output_motl_fn
Relative path and name of the output MOTL file.
Subset Options
- subset_size
How many particles to be included in the subset.
- subset_row
The following describes a field in the MOTL to equally distribute particles of the subset amongst. Such that if subset_row was the tomogram row (7), and there were ten tomograms described in the motive list, then the subset of 1000 particles would have 100 particles from each tomogram. If there are more unique values than the subset size then the field is not taken into account.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
random_subset_motl_exec="${exec_dir}/MOTL/subtom_random_subset_motl"
input_motl_fn="combinedmotl/allmotl_1.em"
output_motl_fn="combinedmotl/s5kmotl_1.em"
subset_size=5000
subset_row=7
subtom_plot_filter
Plots the filter applied to the reference from a user-specified set of band-pass settings. The filter can also be plotted in conjunction with a CTF root square function and a B-factor described exponential decay falloff curve. The plot can also be saved to disk.
This utility script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- plot_filter_exec
Plot filter executable.
File Options
- output_fn_prefix
Relative path and name prefix of the output plot. If you want to skip this output file leave this set to “”.
Plot Filter Options
- box_size
Size of the volume in pixels. The volume will be a cube with this side length.
- pixelsize
Pixelsize of the data in Angstroms.
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as:
high_pass_fp = (box_size * pixelsize) / (high_pass_A)
(define as integer e.g. high_pass_fp=2)
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above.
- low_pass_fp
Low pass filter cutoff (in transform units (pixels): calculate as:
low_pass_fp = (box_size * pixelsize) / (low_pass_A)
(define as integer e.g. low_pass_fp=48)
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above.
- defocus
Defocus to plot along with band-pass filter in Angstroms with underfocus being positive. The graphic will include a line for the CTF root square and how it is attenuated by the band-pass which can be useful for understanding how amplitudes are modified by the filter. If you do not want to use this option just leave it set to “0” or “”.
- voltage
Voltage in keV used for calculating the CTF. If you do not want to plot a CTF function leave this set to “” or “300”.
- cs
Spherical aberration in mm used for calculating the CTF. If you do not want to plot a CTF function leave this set to “” or “0.0”.
- ac
Amplitude contrast as a fraction of contrast (i.e. between 0 and 1) used for calculating the CTF. If you do not want to plot a CTF function leave this set to “” or “1”.
- phase_shift
Phase shift in degrees used for calculating the CTF. If you do not want to plot a CTF function leave this set to “” or “0”.
- b_factor
B-Factor describing the falloff of signal in the data by a multitude of amplitude decay factors. The graphic will include a line for the falloff and how it interacts with both the CTF if one was given and the band-pass filter. If you do not want to use this option just leave it set to “” or “0”
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
plot_filter_exec="${exec_dir}/utils/subtom_plot_filter"
output_fn_prefix=""
box_size="192"
pixelsize="1.35"
high_pass_fp="1"
high_pass_sigma="2"
low_pass_fp="48"
low_pass_sigma="3"
defocus="15000"
voltage="300"
cs="2.7"
ac="0.07"
phase_shift="0.0"
b_factor="-130"
subtom_plot_scanned_angles
Plots the angles searched for a user-specified set of alignment angles. The angles can also be centered about a given initial orientation. The marker of the plot can be adjusted and the plot can also be saved to disk.
This utility script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- plot_angles_exec
Plot scanned angles executable.
File Options
- output_fn_prefix
Relative path and name prefix of the output plot. If you want to skip this output file leave this set to “”.
Plot Scanned Angles Options
- psi_angle_step
Angular increment in degrees, applied during the cone-search, i.e. psi and theta (define as real e.g. psi_angle_step=3)
- psi_angle_shells
Number of angular iterations, applied to psi and theta (define as integer e.g. psi_angle_shells=3)
- phi_angle_step
Angular increment for phi in degrees, (define as real e.g. phi_angle_step=3)
- phi_angle_shells
Number of angular iterations for phi, (define as integer e.g. phi_angle_shells=3)
- initial_phi
Initial first Euler angle rotation around the Z-axis about which the scanned angles are centered. (define as real e.g. initial_phi=45).
- initial_psi
Initial third Euler angle rotation around the Z-axis about which the scanned angles are centered. (define as real e.g. initial_psi=30).
- initial_theta
Initial second Euler angle rotation around the X-axis about which the scanned angles are centered. (define as real e.g. initial_theta=135).
- angle_fmt
If the above angles are specified as degress leave this set to ‘degrees’, but if the angles above are in radian format set this to ‘radians’.
- marker_size
Set the marker size of the arrows that are drawn for the rotations, reasonable values seem to be around the range of 0.01 to 0.1.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
plot_angles_exec="${exec_dir}/utils/subtom_plot_scanned_angles"
output_fn_prefix=""
psi_angle_step="6"
psi_angle_shells="7"
phi_angle_step="6"
phi_angle_shells="7"
initial_phi="0"
initial_psi="0"
initial_theta="0"
angle_fmt="degrees"
marker_size="0.02"
subtom_reconstruct
This is a run script for the reconstruction and possibly also CTF correction processing of electron cryo-tomograhpy data by means of the program novaCTF.
This script is meant to run on a local workstation but can also submit some of the processing to the cluster so that data can be preprocessed in parallel. However, note that the read/write density of operations in novaCTF is extremely large and therefore care should be taken to not overload systems, or be prepared to have a very slow connection to your filesystem.
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
Executables
- novactf_exe
Absolute path to the novaCTF executable.
- newstack_exe
Absolute path to the IMOD newstack executable. The directory of this will be used for the other IMOD programs used in the processing.
- exec_dir
Directory for subTOM executables.
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
- num_threads
Set this value to the number of jobs you want to run in the background before reconstruction. Should be the number of threads on the local system or cluster which for our system is 24 on the cluster and higher on the local systems, but there you should be polite!
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
File Options
- tomo_fmt
The format string for the datasets to process. The string XXXIDXXXX will be replaced with the numbers specified between the range start_idx and end_idx.
- tomo_dir_fmt
The format string for the directory of datasets to process. The string XXXIDXXXX will be replaced with the numbers specified between the range start_idx and end_idx.
- start_idx
The first tomogram to operate on.
- end_idx
The last tomogram to operate on.
- idx_fmt
The format string for the tomogram indexes. Likely two or three digit zero padding or maybe just flat integers.
General CTF Options
- defocus_file
Where the defocus list file is located. The string XXXIDXXXX will be replaced with the formatted tomogram index, i.e. XXXIDXXXX_output.txt will be turned into 01_output.txt.
- pixel_size
The pixel size of the tilt series in nanometers. Note NANOMETERS!
- amplitude_contrast
The amplitude contrast for CTF correction.
- cs
The spherical aberration of the microscope in mm for CTF correction.
- voltage
The voltage in KeV of the microscope for CTF correction.
Nova 3D-CTF Options
- do_3dctf
Set this value to 1 if you want to do 3D-CTF correction during the reconstruction of the tomograms. If this value is set to 0 NovaCTF will still be used but it will generate tomograms largely identical to IMOD’s WBP.
- correction_type
Type of CTF correction to perform.
- defocus_file_format
File format for the defocus list. Use ctffind4 for CTFFIND4, imod for CTFPLOTTER and gctf for Gctf.
- defocus_step
The strip size in nanometers to perform CTF correction in novaCTF refer to the paper for more information on this value and sensible defaults.
- correct_astigmatism
Do you want to correct astigmatism 1 for yes 0 for no.
- defocus_shift_file
If you want to shift the defocus for some reason away from the center of the mass of the tomogram provide a defocus_shifts file with the shifts. See the paper for more information on this value. If you do not want to use this option leave the value “”.
IMOD 2D-CTF Options
- do_2dctf
Set this value to 1 if you want to do 2D-CTF correction during the reconstruction of the tomograms. As of now if you are doing 2D-CTF correction only “imod” is valid as a value for “defocus_file_format”.
- defocus_shift
If you want to shift the defocus for some reason away from the center of the mass of the tomogram provide the number of pixels to shift here. The sign of the the shift is the same as for SHIFT in IMOD’s tilt.com, but depends on the binning of the data, whereas in tilt it is for unbinned data. Refer to the man page for ctfphaseflip for a more detailed description.
- defocus_tolerance
Defocus tolerance in nanometers, which is one factor that governs the width of the strips. The actual strip width is based on the width of this region and several other factors. Refer to the man page for ctfphaseflip for a more detailed description.
- interpolation_width
The distance in pixels between the center lines of two consecutive strips. Refer to the man page for ctfphaseflip for a more detailed description.
- use_gpu
If you want to use a GPU set this to 1, but be careful to not use both the cluster and the GPU as this is not supported.
Radial Filter Options
- do_radial
Set this value to 1 if you want to radial filter the projections before reconstruction. This corresponds to the W (weighted) in WBP, which is commonly what you want to do, however if you want to only back-project without the weighting set this value to 0.
- radial_cutoff
The parameters of RADIAL from the tilt manpage in IMOD that describes the radial filter used to weight before back-projection.
- radial_falloff
The parameters of RADIAL from the tilt manpage in IMOD that describes the radial filter used to weight before back-projection.
IMOD Options
- erase_radius
The radius in pixels to erase when removing the gold fiducials from the aligned tilt-series stacks. Be careful that the value you give is appropriate for the unbinned aligned stack, which may be different than the value used in eTomo on the binned version.
- do_rotate_tomo
Set this value to 1 if you want to use trimvol or clip rotx to rotate the tomogram from the PERPENDICULAR XZ generated tomograms to the standard XY PARALLEL orientation. Set this value to 0 if you want to skip this step which greatly speeds up processing and reduces the memory footprint, but at the cost of easy visualization of the tomogram.
- do_trimvol
Set this value to 1 if you want to use “trimvol -rx” to flip the tomograms to the XY standard orientation from the XZ generated tomograms. Otherwise “clip rotx” will be used since it is much faster.
Example
scratch_dir="${PWD}"
novactf_exe="$(which novaCTF)"
newstack_exe="$(which newstack)"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
mem_free="1G"
mem_max="64G"
num_threads=1
job_name="subTOM"
run_local=0
tomo_fmt="TS_XXXIDXXXX_dose-filt"
tomo_dir_fmt="TS_XXXIDXXXX"
start_idx=1
end_idx=1
idx_fmt="%02d"
defocus_file="ctfplotter/TS_XXXIDXXXX_output.txt"
pixel_size=0.1
amplitude_contrast=0.07
cs=2.7
voltage=300
do_3dctf=1
correction_type="multiplication"
defocus_file_format="imod"
defocus_step=15
correct_astigmatism=1
defocus_shift_file=""
do_2dctf=0
defocus_shift=0
defocus_tolerance=200
interpolation_width=20
use_gpu=0
do_radial=1
radial_cutoff=0.35
radial_falloff=0.035
erase_radius=32
do_rotate_vol=1
do_trimvol=0
subtom_renumber_motl
Renumbers the particle indices in a motive list, either sequentially or in a way that preserves particle indices while still making sure there are no duplicates in the list of indices.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- renumber_motl_exec
Renumber motive list executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be renumbered.
- output_motl_fn
Relative path and name of the output MOTL file.
Renumber Options
- sort_row
If you want to have the output MOTL file sorted by a particular field before renumbering then specify it here.
- do_sequential
If the following is 1, particles will be completely renumbered from 1 to the number of particles in the motive list. If it is 0, particles will be renumbered in a way that preserves the original index while still removing any duplicate indices. As a guide you probably want to renumber sequentially after cleaning from initial oversampled coordinates, but do not want to renumber sequentially in other cases.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
renumber_motl_exec="${exec_dir}/MOTL/subtom_renumber_motl"
input_motl_fn="combinedmotl/allmotl_1.em"
output_motl_fn="combinedmotl/allmotl_unique_1.em"
sort_row="4"
do_sequential="0"
subtom_rotx_motl
Transforms a given MOTL file so that it matches a tomogram rotated or not rotated by IMOD’s ‘clip rotx’ command.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- tomogram_dir
Absolute path to the folder where the tomograms used in the INPUT motive list are stored.
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- rotx_motl_exec
Rotx motive list executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be transformed.
- output_motl_fn
Relative path and name of the output MOTL file.
Rotx Options
- tomo_row
Row number of allmotl for tomogram numbers.
- do_rotx
If the following is set to 1 the input MOTL will be transformed in the same way as done by ‘clip rotx’. If it is set to 0 the input MOTL will be transformed by the inverse operation (a positive 90 degree rotation about the X-axis).
Example
tomogram_dir="/net/dstore2/teraraid/dmorado/subTOM_tutorial/data/tomos/bin4"
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
rotx_motl_exec="${exec_dir}/MOTL/subtom_rotx_motl"
input_motl_fn="../bin4/combinedmotl/allmotl_1.em"
output_motl_fn="combinedmotl/allmotl_1.em"
tomo_row="7"
do_rotx="0"
subtom_scale_motl
Scales a given MOTL file by a given factor. It also resets the shifts in the motive list (rows 11 to 13) to values less than 1 so that with a given scale factor of 1, it can apply the shifts to the tomogram coordinates (rows 8 to 10) so that particles can be reextracted better centered to allow for tighter CC masks to be used in further iterations of alignment.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- scale_motl_exec
Scale motive list executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be unbinned.
- output_motl_fn
Relative path and name of the output MOTL file.
Scaling Options
- scale_factor
How much to scale up the tomogram coordinate extraction positions (rows 8 through 10 in the MOTL) and the particle shifts (rows 11 through 13).
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
scale_motl_exec="${exec_dir}/MOTL/subtom_scale_motl"
input_motl_fn="../bin8/combinedmotl/allmotl_1.em"
output_motl_fn="combinedmotl/allmotl_1.em"
scale_factor=2
subtom_scale_noisemotl
Scales a the individual noise motive lists corresponding to a given MOTL file by a given factor. It first concatenates all the necessary input noise motive lists, then scales the motive list by factor and then finally splits the motive list again by tomogram.
This MOTL manipulation script uses three MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- cat_motls_exec
Concatenate motive lists executable
- scale_motl_exec
Scale motive list executable
- split_motl_by_row_exec
Split MOTL by row executable.
- motl_dump_exec
MOTL dump executable
File Options
- iteration
The iteration of the all particle motive list to process from: input will be all_motl_fn_prefix_iteration.em (define as integer e.g. iteration=1)
- all_motl_fn_prefix
Relative path and prefix to allmotl file from scratch directory.
- input_noise_motl_fn_prefix
Relative path and prefix to input noisemotls.
- output_noise_motl_fn_prefix
Relative path and prefix to output noisemotls.
Tomogram Options
- tomo_row
Row number of allmotl for tomogram numbers.
Scaling Options
- scale_factor
How much to scale up the tomogram coordinate extraction positions (rows 8 through 10 in the MOTL), e.g. To scale from bin8 to bin4 the factor would be 2.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
cat_motls_exec="${exec_dir}/MOTL/subtom_cat_motls"
scale_motl_exec="${exec_dir}/MOTL/subtom_scale_motl"
split_motl_by_row_exec="${exec_dir}/MOTL/subtom_split_motl_by_row"
motl_dump_exec="${exec_dir}/MOTL/motl_dump"
iteration="1"
all_motl_fn_prefix="combinedmotl/allmotl"
input_noise_motl_fn_prefix="../bin8/combinedmotl/noisemotl"
output_noise_motl_fn_prefix="combinedmotl/noisemotl"
tomo_row="7"
scale_factor="2"
subtom_seed_positions
Takes the motive lists made from clicker files in UCSF Chimera and places a number of points at a given spacing along spherical or tubular surfaces.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables.
Variables
- seed_pos_exec
Seed positions on motive list executable.
File Options
- input_motl_fn_prefix
Relative path and prefix of the input MOTL files to be seeded. The files are expected to have the format input_motl_fn_prefix_#.em where # is the number corresponding to the tomogram corresponding to the motive list and this value will go into row 7 of the output motive list.
- output_motl_fn
Relative path and name of the output MOTL file.
Seed Options
- spacing
The spacing in pixels at which positions will be added to the surface.
- do_tubule
If this is set to 1 (i.e. evaluates to true in Matlab) then the clicker motive list is assumed to correspond to tubules and points will be added along the tubule-axis. Otherwise the clicker file is assumed to correspond to spheres.
- rand_inplane
If this is set to 1 (i.e. evaluates to true in Matlab) then the inplane rotation of particles along a tubule will be randomized as opposed to the default which is to place the X-axis orthogonal to the longest tubule axis.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
seed_pos_exec="${exec_dir}/MOTL/subtom_seed_positions"
input_motl_fn_prefix="../startset/clicker"
output_motl_fn="combinedmotl/allmotl_1.em"
spacing=8
do_tubule=0
rand_inplane=0
subtom_shape
Creates a simple shape of a user-specified type that can be used for masking purposes.
This utility script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- shape_exec
Shape executable
File Options
- output_fn
Relative path and name of the output volume to write.
- ref_fn
Relative path and name of a reference to apply the mask to, which can be useful for checking. If you want to skip this check just leave it set to “”.
Shape Options
- shape
- The shape to place into the volume. The available options are:
sphere,
sphere_shell
ellipsoid
ellipsoid_shell
cylinder
tube
elliptic_cylinder
elliptic_tube
cuboid
- box_size
Size of the volume in pixels. The volume will be a cube with this side length.
- radius
- For the shapes:
sphere
cylinder
Defines the radius of the shape. If you are not creating one of the listed shapes leave this set to “”.
- inner_radius
- For the shapes:
sphere_shell
tube
Defines the inner radius of the shape. If you are not creating one of the listed shapes leave this set to “”.
- outer_radius
- For the shapes:
sphere_shell
tube
Defines the outer radius of the shape. If you are not creating one of the listed shapes leave this set to “”.
- radius_x
- For the shapes:
ellipsoid
elliptic_cylinder
Defines the radius of the shape about the X-axis. If you are not creating one of the listed shapes leave this set to “”.
- radius_y
- For the shapes:
ellipsoid
elliptic_cylinder
Defines the radius of the shape about the Y-axis. If you are not creating one of the listed shapes leave this set to “”.
- radius_z
- For the shapes:
ellipsoid
elliptic_cylinder
Defines the radius of the shape about the Z-axis. If you are not creating one of the listed shapes leave this set to “”.
- inner_radius_x
- For the shapes:
ellipsoid_shell
elliptic_tube
Defines the inner radius of the shape about the X-axis. If you are not creating one of the listed shapes leave this set to “”.
- inner_radius_y
- For the shapes:
ellipsoid_shell
elliptic_tube
Defines the inner radius of the shape about the Y-axis. If you are not creating one of the listed shapes leave this set to “”.
- inner_radius_z
- For the shapes:
ellipsoid_shell
elliptic_tube
Defines the inner radius of the shape about the Z-axis. If you are not creating one of the listed shapes leave this set to “”.
- outer_radius_x
- For the shapes:
ellipsoid_shell
elliptic_tube
Defines the outer radius of the shape about the X-axis. If you are not creating one of the listed shapes leave this set to “”.
- outer_radius_y
- For the shapes:
ellipsoid_shell
elliptic_tube
Defines the outer radius of the shape about the Y-axis. If you are not creating one of the listed shapes leave this set to “”.
- outer_radius_z
- For the shapes:
ellipsoid_shell
elliptic_tube
Defines the outer radius of the shape about the Z-axis. If you are not creating one of the listed shapes leave this set to “”.
- length_x
- For the shape:
cuboid
Defines the length of the cuboid about the X-axis. If you are not creating one of the listed shapes leave this set to “”.
- length_y
- For the shape:
cuboid
Defines the length of the cuboid about the Y-axis. If you are not creating one of the listed shapes leave this set to “”.
- length_z
- For the shape:
cuboid
Defines the length of the cuboid about the Z-axis. If you are not creating one of the listed shapes leave this set to “”.
- height
- For the shape:
cylinder
tube
elliptic_cylinder
elliptic_tube
Defines the height of the shape. If you are not creating one of the listed shapes leave this set to “”.
- center_x
For all shapes. Defines the X-coordinate of the center of the shape. The default center is defined as:
center_x = floor(box_size / 2) + 1;
If you do not want to modify the default value leave this set to “”.
- center_y
For all shapes. Defines the Y-coordinate of the center of the shape. The default center is defined as:
center_y = floor(box_size / 2) + 1;
If you do not want to modify the default value leave this set to “”.
- center_z
For all shapes. Defines the Z-coordinate of the center of the shape. The default center is defined as:
center_z = floor(box_size / 2) + 1;
If you do not want to modify the default value leave this set to “”.
- shift_x
For all shapes. Defines a shift along the X-axis after any given rotations. This shift is part of an affine transformation about the given center that is applied to the coordinates before the shape is determined. If you do not want to modify the default value leave this set to “”.
- shift_y
For all shapes. Defines a shift along the Y-axis after any given rotations. This shift is part of an affine transformation about the given center that is applied to the coordinates before the shape is determined. If you do not want to modify the default value leave this set to “”.
- shift_z
For all shapes. Defines a shift along the Z-axis after any given rotations. This shift is part of an affine transformation about the given center that is applied to the coordinates before the shape is determined. If you do not want to modify the default value leave this set to “”.
- rotate_phi
For all shapes. Defines an inplane rotation about the Z-axis. This rotation is part of an affine transformation about the given center that is applied to the coordinates before the shape is determined. If you do not want to modify the default value leave this set to “”.
- rotate_psi
For all shapes. Defines an azimuthal rotation about the Z-axis. This rotation is part of an affine transformation about the given center that is applied to the coordinates before the shape is determined. If you do not want to modify the default value leave this set to “”.
- rotate_theta
For all shapes. Defines a zenithal rotation about the X-axis. This rotation is part of an affine transformation about the given center that is applied to the coordinates before the shape is determined. If you do not want to modify the default value leave this set to “”.
- sigma
For all shapes. Defines the sigma of a Gaussian falloff away from the hard edges of the shape. If you do not want to modify the default value leave this set to “”.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
shape_exec="${exec_dir}/utils/subtom_shape"
output_fn="otherinputs/mask.em"
ref_fn="ref/ref_1.em"
shape="sphere"
box_size="128"
radius="32"
inner_radius=""
outer_radius=""
radius_x=""
radius_y=""
radius_z=""
inner_radius_x=""
inner_radius_y=""
inner_radius_z=""
outer_radius_x=""
outer_radius_y=""
outer_radius_z=""
length_x=""
length_y=""
length_z=""
height=""
center_x=""
center_y=""
center_z=""
shift_x=""
shift_y=""
shift_z=""
rotate_phi=""
rotate_psi=""
rotate_theta=""
sigma="3"
subtom_split_motl_by_row
Splits a given MOTL file by unique entries in a given field.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- split_motl_by_row_exec
Split MOTL by row executable.
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be split.
- output_motl_fn_prefix
Relative path and filename prefix of output MOTL files.
Split Motl Options
- split_row
Which row to split the input MOTL file by.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
split_motl_by_row_exec="${exec_dir}/MOTL/subtom_split_motl_by_row"
input_motl_fn="combinedmotl/allmotl_1.em"
output_motl_fn_prefix="combinedmotl/allmotl_1_tomo"
split_row=7
subtom_transform_motl
Apply a rotation and a shift to a MOTL file.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- transform_motl_exec
Absolute path to transform_motl executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be transformed.
- output_motl_fn
Relative path and name of the output MOTL file.
Transform Options
- shift_x
How much to shift the reference along the X-Axis, applied after the rotations described below.
- shift_y
How much to shift the reference along the Y-Axis, applied after the rotations described below.
- shift_z
How much to shift the reference along the Z-Axis, applied after the rotations described below.
- rotate_phi
Hom much to finally rotate the reference in-plane about it’s final Z-Axis. (i.e. Spin rotation corresponding to phi).
- rotate_psi
How much to first rotate the reference about it’s initial Z-Axis. (i.e. Azimuthal rotation corresponding to psi).
- rotate_theta
How much to second rotate the reference about it’s intermediate X-Axis. (i.e. Zenithal rotation corresponding to theta).
- rand_inplane
If this is set to 1 (i.e. evaluates to true in Matlab) then the inplane rotation of particles will be randomized after the application of the given transform.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
transform_motl_exec="${exec_dir}/MOTL/subtom_transform_motl"
input_motl_fn="combinedmotl/allmotl_1.em"
output_motl_fn="combinedmotl/allmotl_transformed_1.em"
shift_x=0.0
shift_y=0.0
shift_z=0.0
rotate_phi=0.0
rotate_psi=0.0
rotate_theta=0.0
rand_inplane=0
subtom_unclass_motl
Removes the iclass information in the 20th field of a motive list.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- unclass_motl_exec
Unclass motive list executable
File Options
- input_motl_fn
Relative path and name of the input MOTL file to be unclassed.
- output_motl_fn
Relative path and name of the output MOTL file.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
scale_motl_exec="${exec_dir}/MOTL/subtom_unclass_motl"
input_motl_fn="combinedmotl/allmotl_wmd_5.em"
output_motl_fn="combinedmotl/allmotl_wmd_unclassed_1.em"
Functions
Functions are the internals of subTOM which are implemented in Matlab and utilizes the TOM toolbox to do the actual processing of subvolumes alignment and averaging. Each function can be run itself in Matlab, which allows for users to copy and modify these functions in anyway that they see fit.
Care has been taken to make sure the functions have at least upper-level
documentation, which can be accessed using the help
command in Matlab, and
is also included here for reference.
subtom_bandpass
Creates and/or applies a bandpass filter to a volume.
subtom_bandpass(
'input_fn', input_fn (''),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'filter_fn', filter_fn (''),
'output_fn', output_fn (''))
Simply creates and/or applies a bandpass filter just as would be done during
alignment, with the option to write out the Fourier Filter volume as well just
for visualization purposes. input_fn
defines the volume to be filtered, or
at minimum the box size used to create the filter volume. The Fourier domain
filter created is dependent on the parameters high_pass_fp
,
high_pass_sigma
, low_pass_fp
, low_pass_sigma
which are all in the
units of Fourier pixels. If filter_fn
is a non-empty string then the
bandpass filter volume itself is written to the filename given. If output_fn
is a non-empty string then the bandpass filtered volume is written to the
filename given.
Example
subtom_bandpass(...
'input_fn', 'ref/ref_1.em', ...
'high_pass_fp', 2, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'filter_fn', 'otherinputs/bandpass_hp2s2_lp15s3.em',
'output_fn', 'ref/ref_hp2s2_lp15s3_1.em')
See Also
subtom_cat_motls
Concatenate motive lists and print on the standard output.
subtom_cat_motls(
'write_motl', write_motl (0),
'output_motl_fn', output_motl_fn (''),
'write_star', write_star (0),
'output_star_fn', output_star_fn (''),
'sort_row' sort_row (0),
'do_quiet', do_quiet (0),
input_motl_fns)
Takes the motive lists given in input_motl_fns
, and concatenates them all
together. If write_motl
evaluates to True as a boolean then the joined
motive lists are written out as ouput_motl_fn
. The function writes the
motive list information in STAR format and if write_star
evaluates to True
as a boolean then the joined motive lists are also written out as
output_star_fn
. Since the input motive lists can be in any order and this
does not guarantee that the output motive list will have any form of sorting, if
sort_row
is a valid field number the output motive list will be sorted by
sort_row
.
The motive list is also printed to standard ouput. An arbitrary choice has been
made to ouput the motive list in STAR format, since it is used in other more
well-known EM software packages. If this screen output is not desired set
do_quiet
to evaluate to true as a boolean.
Example
subtom_cat_motls(...
'write_motl', 1, ...
'output_motl_fn', 'combinedmotl/allmotl_1_joined.em', ...
'write_star', 1, ...
'output_star_fn', 'combinedmotl/allmotl_1_joined.star', ...
'sort_row', 4, ...
'do_quiet', 1, ...
'combinedmotl/allmotl_1_tomo_1.em', ...
'combinedmotl/allmotl_1_tomo_3.em');
See Also
subtom_clean_motl
Cleans a given MOTL file based on distance and or CC scores.
subtom_clean_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'tomo_row', tomo_row (7),
'do_ccclean', do_ccclean (0),
'cc_fraction', cc_fraction (1),
'cc_cutoff', cc_cutoff (-1),
'do_distance', do_distance (0),
'distance_cutoff', distance_cutoff (Inf),
'do_cluster', do_cluster (0),
'cluster_distance', cluster_distance (0),
'cluster_size', cluster_size (1),
'do_edge', do_edge (0),
'tomogram_dir', tomogram_dir (''),
'box_size', box_size (0),
'write_stats', write_stats (0),
'output_stats_fn', output_stats_fn (''))
Takes the motl given by input_motl_fn
, and splits it internally by
tomogram given by the row tomo_row
in the MOTL, and then removes particles
by one or multiple methods, if do_ccclean
evaluates to true as a boolean
then one of two methods can be applied. Either cc_cutoff
is specified and
particles that have a CCC less than cc_cutoff
will be discarded.
Alternatively cc_fraction
can be specified as a number between 0 and 1 and
that fraction of the data with the highest CCCs will be kept and the rest
discarded. If do_distance
evaluates to true as a boolean then particles
that are within distance_cutoff
pixels of each other will be determined
and only the particle with the highest CCC, will be kept. If
do_cluster
evaluates to true as a boolean,then particles must have at
least cluster_size
neighbor particles within cluster_distance
to be kept
after cleaning. Finally if do_edge
evaluates to true as a boolean then the
program will look for a tomogram in tomogram_dir
, and if a particle of
box size box_size
would extend outside of the tomogram it will be removed.
Example
subtom_clean_motl(...
'input_motl_fn', 'combinedmotl/allmotl_3.em', ...
'output_motl_fn', 'combinedmotl/allmotl_3_cc0.1_dist4_c2d10.em', ...
'tomo_row', 7, ...
'do_ccclean', 1, ...
'cc_fraction', 1, ...
'cc_cutoff', 0.1, ...
'do_distance', 1, ...
'distance_cutoff', 4, ...
'do_cluster', 1, ...
'cluster_distance', 10, ...
'cluster_size, 2, ...
'do_edge', 1, ...
'tomogram_dir', '../../tomos/bin8', ...
'box_size', 36, ...
'write_stats', 1, ...
'output_stats_fn', 'combinedmotl/allmotl_3_cleaned_stats.csv')
See Also
subtom_compare_motls
Compares orientations and shifts between two MOTLs.
subtom_compare_motls(
'motl_1_fn', motl_1_fn (''),
'motl_2_fn', motl_2_fn (''),
'write_diffs', write_diffs (0),
'output_diffs_fn', output_diffs_fn (''))
Takes the motls given by motl_1_fn
and motl_2_fn
and calculates the
differences for both the orientations and coordinates between corresponding
particles in each motive list. If write_diffs
evaluates to true as a
boolean, then also a CSV file with the differences in coordinates and
orientations to diffs_output_fn
.
Example
subtom_compare_motls(...
'motl_1_fn', 'combinedmotl/allmotl_1.em', ...
'motl_2_fn', 'combinedmotl/allmotl_2.em', ...
'write_diffs', 1, ...
'output_diffs_fn', 'combinedmotl/allmotl_1_2_diff.csv')
See Also
subtom_even_odd_motl
Split a MOTL file into even odd halves.
subtom_even_odd_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'even_motl_fn', even_motl_fn, (''),
'odd_motl_fn', odd_motl_fn (''),
'split_row', split_row (4))
Takes the MOTL file specified by input_motl_fn
and writes out seperate MOTL
files with even_motl_fn
and odd_motl_fn
where each output file
corresponds to roughly half of input_motl_fn
. The motive list can also write
a single motive list file with the half split described using the iclass (20th
row of the motive list) where the odd half takes particle’s current class number
plus 100 and the even half takes the particle’s current class number plus 200.
The MOTL is split by the values in split_row
, initially just taking even/odd
halves of the unique values in that given row, and then this is slightly
adjusted by naively adding to the lesser half until closest to half is found.
Example
subtom_even_odd_motl(...
'input_motl_fn', 'combinedmotl/allmotl_1.em', ...
'output_motl_fn', 'combinedmotl/allmotl_eo_1.em', ...
'even_motl_fn', 'even/combinedmotl/allmotl_1.em', ...
'odd_motl_fn', 'odd/combinedmotl/allmotl_1.em', ...
'split_row', 4)
See Also
subtom_extract_noise
Extract noise amplitude spectra on the cluster.
subtom_extract_noise(
'tomogram_dir', tomogram_dir (''),
'tomo_row', tomo_row (7),
'ampspec_fn_prefix', ampspec_fn_prefix ('otherinputs/ampspec'),
'binary_fn_prefix', binary_fn_prefix ('otherinputs/binary'),
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'noise_motl_fn_prefix', noise_motl_fn_prefix ('combinedmotl/noisemotl'),
'iteration', iteration (1),
'box_size', box_size (-1),
'just_extract', just_extract (0),
'ptcl_overlap_factor', ptcl_overlap_factor (0),
'noise_overlap_factor', noise_overlap_factor (0),
'num_noise', num_noise (1000),
'process_idx', process_idx (1),
'reextract', reextract (0),
'preload_tomogram', preload_tomogram (1),
'use_tom_red', use_tom_red (0),
'use_memmap', use_memmap (0))
Takes the tomograms given in tomogram_dir
and extracts average amplitude
spectra and binary wedges into files with the name formats ampspec_fn_prefix
_#.em and binary_fn_prefix
_ #.em, respectively where # corresponds to the
tomogram from which it was created.
Tomograms are specified by the field tomo_row
in motive list
all_motl_fn_prefix
_#.em where # corresponds to iteration
. and the
tomogram that will be processed is selected by process_idx
. Motive lists
with the positions used to generate the amplitude spectrum are written with the
name format noise_motl_fn_prefix
_#.em.
num_noise
Noise volumes of size box_size
are first identified that do not
clash with the subtomogram positions in the input motive list or other already
selected noise volumes. ptcl_overlap_factor
and noise_overlap_factor
define how much overlap selected noise volumes can have with subtomograms and
other noise volumes respectively with 1 being complete overlap and 0 being
complete separation.
If noise_motl_fn_prefix
_#.em already exists then if just_extract
evaluates to true as a boolean, then noise volume selection is skipped and the
positions in the motive list are extracted and the amplitude spectrum is
generated, whether or not the length of the motive list is less than
num_noise
. Otherwise positions will be added to the motive list up to
num_noise
.
If reextract
evaluates to true as a boolean, than existing amplitude spectra
will be overwritten. Otherwise the program will do nothing and exit if the
volume already exists. This is for in the case that the processing crashed at
some point in execution and then can just be re-run without any alterations.
If preload_tomogram
evaluates to true as a boolean, then the whole tomogram
will be read into memory before extraction begins, otherwise the particles will
be read from disk or from a memory-mapped tomogram. If use_tom_red
evaluates
to true as a boolean the old particle extraction code will be used, but this is
only for legacy support and is not suggested for use. Finally if use_memmap
evaluates to true as a boolean then in place of reading each particle from disk
a memory-mapped version of the file of will be created to attempt faster access
in extraction.
Example
subtom_extract_noise(...
'tomogram_dir', '../data/tomos/bin8', ...
'tomo_row', 7, ...
'ampspec_fn_prefix', 'otherinputs/ampspec', ...
'binary_fn_prefix', 'otherinputs/binary', ...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'noise_motl_prefix', 'combinedmotl/noisemotl', ...
'iteration', 1, ...
'box_size', 36, ...
'just_extract', 0, ...
'ptcl_overlap_factor', 0.0, ...
'noise_overlap_factor, 0.75, ...
'num_noise', 1000,
'process_idx', 1, ...
'reextract', 1, ...
'preload_tomogram', 1, ...
'use_tom_red', 0, ...
'use_memmap', 0)
See also
subtom_extract_subtomograms
Extract subtomograms on the cluster.
subtom_extract_subtomograms(
'tomogram_dir', tomogram_dir (''),
'tomo_row', tomo_row (7),
'subtomo_fn_prefix', subtom_fn_prefix ('subtomograms/subtomo'),
'subtomo_digits', subtomo_digits (1),
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'stats_fn_prefix', stats_fn_prefix ('subtomograms/stats/tomo'),
'iteration', iteration (1),
'box_size', box_size (-1),
'process_idx', process_idx (1),
'reextract', reextract (0),
'preload_tomogram', preload_tomogram (1),
'use_tom_red', use_tom_red (0),
'use_memmap', use_memmap (0))
Takes the tomograms given in tomogram_dir
and extracts subtomograms
specified in all_motl_fn_prefix
_#.m where # corresponds to iteration
with size box_size
into scratch_dir
with the name formats
subtomo_fn_prefix
_#.em where # corresponds to the entry in field 4 in
all_motl_fn_prefix
_#.em zero-padded to have at least subtomo_digits
digits.
Tomograms are specified by the field tomo_row
in motive list
all_motl_fn_prefix
_#.em, and the tomogram that will be processed is
selected by process_idx
. A CSV-format file with the subtomogram ID-number,
average, min, max, standard deviation and variance for each subtomogram in the
tomogram is also written with the name format stats_fn_prefix
_#.em where #
corresponds to the tomogram from which subtomograms were extracted.
If reextract
evaluates to true as a boolean, than existing subtomograms will
be overwritten. Otherwise the program will do nothing and exit if
stats_fn_prefix
_#.em already exists, or will also skip any subtomogram it
is trying to extract that already exists. This is for in the case that the
processing crashed at some point in execution and then can just be re-run
without any alterations.
If preload_tomogram
evaluates to true as a boolean, then the whole tomogram
will be read into memory before extraction begins, otherwise the particles will
be read from disk or from a memory-mapped tomogram. If use_tom_red
evaluates
to true as a boolean the old particle extraction code will be used, but this is
only for legacy support and is not suggested for use. Finally if use_memmap
evaluates to true as a boolean then in place of reading each particle from disk
a memory-mapped version of the file of will be created to attempt faster access
in extraction.
Example
subtom_extract_subtomograms(...
'tomogram_dir', '../data/tomos/bin8', ...
'tomo_row', 7, ...
'subtomo_fn_prefix', 'subtomograms/subtomo', ...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'stats_fn_prefix', 'subtomograms/stats/tomo', ...
'iteration', 1, ...
'box_size', 36, ...
'process_idx', 1, ...
'reextract', 0, ...
'preload_tomogram', 1, ...
'use_tom_red', 0, ...
'use_memmap', 0)
See also
subtom_maskcorrected_FSC
Calculates “mask-corrected” FSC and sharpened refs.
subtom_maskcorrected_fsc(
'ref_a_fn_prefix', ref_a_fn_prefix ('even/ref/ref'),
'ref_b_fn_prefix', ref_b_fn_prefix ('odd/ref/ref'),
'fsc_mask_fn', fsc_mask_fn ('FSC/fsc_mask.em'),
'output_fn_prefix', output_fn_prefix ('FSC/ref'),
'filter_a_fn', filter_a_fn (''),
'filter_b_fn', filter_b_fn (''),
'do_reweight', do_reweight (0),
'do_sharpen', do_sharpen (0),
'plot_fsc', plot_fsc (0),
'plot_sharpen', plot_sharpen (0),
'filter_mode', filter_mode (1),
'pixelsize', pixelsize (1.0),
'nfold', nfold (1),
'filter_threshold', filter_threshold (0.143),
'rand_threshold', rand_threshold (0.8),
'b_factor', b_factor (0),
'box_gaussian', box_gaussian (1),
'iteration', iteration (1))
Takes in two references ref_a_fn_prefix
_#.em and ref_b_fn_prefix
_#.em
where # corresponds to iteration
and a FSC mask fsc_mask_fn
and
calculates a “mask-corrected” FSC. This works by randomizing the structure
factor phases beyond the point where the unmasked FSC curve falls below a given
threshold (by default 0.8) and calculating an additional FSC between these phase
randomized maps. This allows for the determination of the extra correlation
caused by effects of the mask, which is then subtracted from the normal masked
FSC curves. The curve will be saved as a Matlab figure and a PDF file, and if
plot_fsc
is true it will also be displayed.
The script can also output maps with the prefix output_fn_prefix
that have
been sharpened with b_factor
if do_sharpen
is turned on. This setting
has two threshold settings selected using filter_mode
, FSC (1) and pixel
(2). FSC allows you to use a FSC-value filter_threshold
as a cutoff for the
lowpass filter, while using pixels allows you to use an arbitrary resolution
cutoff in filter_threshold
. The sharpening curve will be saved as a Matlab
figure and a pdf file, and if plot_sharpen
is true it will also be
displayed.
Finally this script can also perform and output reweighted maps if
do_reweight
is true, and the pre-calculated Fourier weight volumes
filter_a_fn
and filter_b_fn
.
Example
subtom_maskcorrected_fsc(...
'ref_a_fn_prefix', 'even/ref/ref', ...
'ref_b_fn_prefix', 'odd/ref/ref', ...
'fsc_mask_fn', 'FSC/fsc_mask.em', ...
'output_fn_prefix', 'FSC/ref', ...
'filter_a_fn', '', ...
'filter_b_fn', '', ...
'do_reweight', 0, ...
'do_sharpen', 1, ...
'plot_fsc', 1, ...
'plot_sharpen', 1, ...
'filter_mode', 1, ...
'pixelsize', 1.35, ...
'nfold', 6, ...
'filter_threshold', 0.143, ...
'rand_threshold', 0.8, ...
'b_factor', -1500, ...
'box_gaussian', 3, ...
'iteration', 1)
subtom_parallel_sums
Creates raw sums and Fourier weight sums in a batch.
subtom_parallel_sums(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'weight_sum_fn_prefix, weight_sum_fn_prefix ('otherinputs/wei'),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'iclass', iclass (0),
'num_avg_batch', num_avg_batch (1),
'process_idx', process_idx (1))
Aligns a subset of particles using the rotations and shifts in
all_motl_fn_prefix
_#.em where # corresponds to iteration
in
num_avg_batch
chunks to make a raw particle sum ref_fn_prefix
_#_###.em
where # corresponds to iteration
and ### corresponds to process_idx
.
Fourier weight volumes with name prefix weight_fn_prefix
will also be
aligned and summed to make a weight sum weight_sum_fn_prefix
_#_###.em.
tomo_row
describes which row of the motl file is used to determine the
correct tomogram fourier weight file. iclass
describes which class outside
of one is included in the averaging.
Example
subtom_parallel_sums(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', 'ref/ref', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'weight_sum_fn_prefix, 'otherinputs/wei', ...
'iteration', 1, ...
'tomo_row', 7, ...
'iclass', 0, ...
'num_avg_batch', 1, ...
'process_idx', 1)
See Also
subtom_plot_filter
Creates a graphic of bandpass filters optionally with CTF.
subtom_plot_filter(
'box_size', box_size (''),
'pixelsize', pixelsize (1),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'defocus', defocus (0),
'voltage', voltage (300),
'cs', cs (0),
'ac', ac ('1.0'),
'phase_shift', phase_shift (0.0),
'b_factor', b_factor (0.0),
'output_fn_prefix', output_fn_prefix (''))
Takes in the local alignment filter parameters used in subTOM, high_pass_fp
,
high_pass_sigma
, low_pass_fp
, and low_pass_sigma
; then produces a
figure showing the filter that will be applied to the Fourier transform of the
reference during alignment. The Fourier pixel frequencies are converted into
Angstroms using the given box_size
and pixelsize
. A single CTF can also
be specified with defocus
, voltage
, cs
, ac
, phase_shift
, and
the root square of this curve will be plotted in addition to how the band-pass
filter affects the amplitude effects of the CTF. Finally a B-factor falloff can
also be specified with b_factor
, and this decay curve will also be plotted
and also plotted with the CTF root square, and also the CTF root square and
band-pass filter all together, so a cumulative effect of a specific choice of
filter parameters at a given defocus and falloff can be observed. If
output_fn_prefix
is not emtpy it is used to save the graphic in MATLAB
figure, pdf, and png formatted files.
Example
subtom_plot_filter(...
'box_size', 192, ...
'pixelsize', 1.35, ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 48, ...
'low_pass_sigma', 3, ...
'defocus', 15000, ...
'voltage', 300, ...
'cs', 2.7, ...
'ac', 0.07, ...
'phase_shift', 0.0, ...
'b_factor', 0, ...
'output_fn_prefix', 'alignment_1');
See Also
subtom_plot_scanned_angles
Creates a graphic of local search rotations.
subtom_plot_scanned_angles(
'psi_angle_step', psi_angle_step (0),
'psi_angle_shells', psi_angl_shells (0),
'phi_angle_step', phi_angle_step (0),
'phi_angle_shells', phi_angle_shells (0),
'initial_phi', initial_phi (0),
'initial_psi', initial_psi (0),
'initial_theta', initial_theta (0),
'angle_fmt', angle_fmt ('degrees'),
'marker_size', marker_size (0.1),
'output_fn_prefix', output_fn_prefix (''))
Takes in the local search parameters used in subTOM psi_angle_step
,
psi_angle_shells
, phi_angle_step
, and phi_angle_shells
; then
produces a figure showing the angles that will be searched using an arrow
marker. The angles are given in either radians or degrees depending on
angle_fmt
. The marker represents the X-axis after rotation and placed on the
unit sphere. The initial marker position is at the north pole of the unit
sphere. The size of the marker is determined by marker_size
. The rotations
can also be displayed centered on an initial rotation given by initial_phi
,
initial_psi
, and initial_theta
. If it is non-empty the figure will be
written out in MATLAB figure, PDF and PNG format using the filename prefix
output_fn_prefix
.
Example
subtom_plot_scanned_angles(...
'psi_angle_step', 6, ...
'psi_angle_shells', 7, ...
'phi_angle_step', 6, ...
'phi_angle_shells', 7, ...
'initial_phi', 0, ...
'initial_psi', 0, ...
'initial_theta', 0, ...
'angle_fmt', 'degrees', ...
'marker_size', 0.02, ...
'output_fn_prefix', 'alignment_1');
See Also
subtom_random_subset_motl
Generates a random subset of a motive list.
subtom_random_subset_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'subset_size', subset_size (1000),
'subset_row', subset_row (7))
Takes the motive list given by input_motl_fn
, and generates a random
subset of subset_size
particles where the subset is distributed equally
over the motive list field subset_row
, and then writes the subset motive
list out as output_motl_fn
.
Example
subtom_random_subset_motl(...
'input_motl_fn', 'combinedmotl/allmotl_2.em', ...
'output_motl_fn', 'combinedmotl/s5kmotl_2.em', ...
'subset_size', 5000, ...
'subset_row', 7)
See Also
subtom_renumber_motl
Renumbers particle indices in a motive list.
subtom_renumber_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'sort_row', sort_row (0),
'do_sequential', do_sequential (0))
Takes the motive list given by input_motl_fn
, and renumbers the particles in
field 4 of the MOTL and writes out the renumbered list to output_motl_fn
. If
do_sequential
evaluates to true as a boolean then the motive list will just
be renumbered from 1 to the number of particles in the MOTL, and the initial
particle indices will be lost. If do_sequential
evaluates to false as a
boolean, then particle indices will be kept with any duplicates of the particle
index incremented by the largest particle index found in the motive list.
For example if do_sequential
is 0, and we have 100 particles where the first
particle index is 4, and the largest particle index in the motive list is 325.
If there are 3 copies of particle index 16 in the motive list, then it will be
renumbered so that these 3 copies correspond to particle indices 16, 341, and
666. In this way as long as we keep the original motive list we can trace back
the origin of each particle.
Example
subtom_renumber_motl(...
'input_motl_fn', 'combinedmotl/allmotl_1.em', ...
'output_motl_fn', 'combinedmotl/allmotl_unique_1.em', ...
'sort_row', 4, ...
'do_sequential', 0)
See Also
subtom_rotx_motl
Transforms a motive list to (un)apply a tomogram rotx operation.
subtom_rotx_motl(
'tomogram_dir', tomogram_dir (''),
'tomo_row', tomo_row (7),
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'do_rotx', do_rotx (0))
Takes the motive list given by input_motl_fn
, and if do_rotx
evaluates to
true as a boolean applies the same transformation as applied by ‘clip
rotx’ in the IMOD package, and else applies the inverse transformation.
The resulting motive list is then written out as output_motl_fn
. The
location of the tomograms needs to be given in tomogram_dir
, as well as
the field that specifies which tomogram to use for each particle in
tomo_row
. The size of the tomogram needs to be known to correctly
transform the particle center coordinates in fields 8-10 in the motive
list.
Example
subtom_rotx_motl(...
'tomogram_dir', '/net/teraraid/dmorado/sample/date/tomos/bin4', ...
'tomo_row', 7, ...
'input_motl_fn', '../bin4/combinedmotl/allmotl_2.em', ...
'output_motl_fn', 'combinedmotl/allmotl_bin4_rotx_1.em', ...
'do_rotx', 0)
See Also
subtom_scale_motl
Scales a given motive list by a given factor.
subtom_scale_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'scale_factor', scale_factor (1))
Takes the motive list given by input_motl_fn
, and scales it by
scale_factor
, and then writes the transformed motive list out as
output_motl_fn
.
Example
subtom_scale_motl(...
'input_motl_fn', '../bin8/combinedmotl/allmotl_2.em', ...
'output_motl_fn', 'combinedmotl/allmotl_1.em', ...
'scale_factor', 2)
See Also
subtom_scan_angles_exact
Align a particle batch over local search angles.
subtom_scan_angles_exact(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'align_mask_fn', align_mask_fn ('none'),
'cc_mask_fn', cc_mask_fn ('noshift'),
'apply_weight', apply_weight (0),
'apply_mask', apply_mask (0),
'psi_angle_step', psi_angle_step (0),
'psi_angle_shells', psi_angle_shells (0),
'phi_angle_step', phi_angle_step (0),
'phi_angle_shells', phi_angle_shells (0),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'threshold', threshold (-1),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'iclass', iclass (0),
'num_ali_batch', num_ali_batch (1),
'process_idx', process_idx (1))
Aligns a batch of particles from the collective motive list with the name format
all_motl_fn_prefix
_#.em where # is the number iteration
. The motive
list is split into num_ali_batch
chunks and the specific chunk to process is
specified by process_idx
. A motive list for the best determined alignment
parameters is written out for each batch with the name format
ptct_motl_fn_prefix
_#_#.em where the first # is iteration
+ 1 and the
second # is the number process_idx
.
Particles, with the name format ptcl_fn_prefx
_#.em where # is the
subtomogram ID, are aligned against the reference with the name format
ref_fn_prefix
_#.em where # is iteration
. Before the comparison is made
a number of alterations are made to both the particle and reference:
If
nfold
is greater than 1 then C#-symmetry is applied along the Z-axis to the reference where # isnfold
.The reference is masked in real space with the mask
align_mask_fn
, and ifapply_mask
evaluates to true as a boolean, then this mask is also applied to the particle. A sphere mask is applied to the particle to reduces the artifacts caused by the box-edges on the comparison. This sphere has a diameter that is 80% the box size and falls of with a sigma that is 15% half the box size.
The mask is rotated and shifted with the currently existing alignment parameters for the particle as to best center the mask on the particle density.
apply_mask
can help alignment and suppress alignment to other features when the particle is well-centered or already reasonably well aligned, but if this is not the case there is the risk that a tight alignment will cutoff parts of the particle.Both the particle and the reference are bandpass filtered in the Fourier domain defined by
high_pass_fp
,high_pass_sigma
,low_pass_fp
, andlow_pass_sigma
which are all in the units of Fourier pixels.A Fourier weight volume with the name format
weight_fn_prefix
_#.em where # corresponds to the tomogram from which the particle came from, which is found from the fieldtomo_row
in the motive list, is applied to the reference in the Fourier domain, after the reference has been rotated with the currently existing alignment parameters. Ifapply_weight
evaluates to true as a boolean, then this weight is also applied to the particle with no rotation. This Fourier weight is designed to compensate for the missing wedge.
If a binary wedge is used, then it is reasonable to apply the weight to the particle, however, for more complicated weights, like the average amplitude spectrum, it should not be done.
The local rotations searched during alignment are deteremined by the four
parameters psi_angle_step
, psi_angle_shells
, phi_angle_step
, and
phi_angle_shells
. They describe a search where the currently existing
alignment parameters for azimuth and zenith are used to define a “pole” to
search about in the ceiling of half psi_angle_shells
cones. The change in
zenith between each cone is psi_angle_step
and the azimuth around the cone
is close to the same angle but is adjusted slightly to account for bias near the
pole. The final spin angle of the search is done with a change in spin of
phi_angle_step
in phi_angle_shells
steps. The spin is applied in both
clockwise and counter-clockwise fashion.
The angles phi, and psi here are flipped in their sense of every other package for EM image processing, which is absolutely infuriating and confusing, but maintained for historical reasons, however most descriptions use the words azimuth, zenith, and spin to avoid ambiguity.
Finally after the constrained cross-correlation function is calculated it is
masked with cc_mask_fn
to limit the shifts to inside this volume, and a peak
is found and it’s location is determined to sub-pixel accuracy using
interpolation. The rotations and shifts that gives the highest cross-correlation
coefficient are then chosen as the new alignments parameters. Particles with a
coefficient lower than threshold
are placed into class 2 and ignored in
later processing, and particles with class iclass
are the only particles
processed.
If
iclass
is 0 all particles will be considered, and particles abovethreshold
will be assigned to iclass of 1 and particles belowthreshold
will be assigned to iclass of 2. Ificlass
is 1 or 2 then particles with iclass 0 will be skipped, particles of iclass 1 and 2 will be aligned and particles with scores abovethreshold
will be assigned to iclass 1 and particles with scores belowthreshold
will be assigned to iclass 2.iclass
of 2 does not make much sense but is set this way in case of user mistakes or misunderstandings. Ificlass
is greater than 2 then particles with iclass of 1, 2, andiclass
will be aligned, and particles with a score abovethreshold
will maintain their iclass if it is 1 oriclass
, and particles with a previous iclass of 2 will be upgraded to an iclass of 1. Particles with a score belowthreshold
will be assigned to iclass 2.The class number is stored in the 20th field of the motive list.
Example
subtom_scan_angles(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', 'ref/ref', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'align_mask_fn', 'otherinputs/align_mask.em', ...
'cc_mask_fn', 'otherinputs/cc_mask.em', ...
'apply_weight', 0, ...
'apply_mask', 1, ...
'psi_angle_step', 6, ...
'psi_angle_shells', 8, ...
'phi_angle_step', 6, ...
'phi_angle_shells', 8, ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 12, ...
'low_pass_sigma', 3, ...
'nfold', 6, ...
'threshold', 0, ...
'iteration', 1, ...
'tomo_row', 7, ...
'iclass', 0, ...
'num_ali_batch', 1, ...
'process_idx', 1)
See Also
subtom_seed_positions
Place particle positions from clicker motive list.
subtom_seed_positions(
'input_motl_fn_prefix', input_motl_fn_prefix ('../startset/clicker'),
'output_motl_fn', output_motl_fn ('combinedmotl/allmotl_1.em'),
'spacing', spacing (8),
'do_tubule', do_tubule (0),
'rand_inplane', rand_inplane (0))
Takes in clicker motive lists from the ‘Pick Particle’ plugin for Chimera
with a name in the format input_motl_fn_prefix
_#.em, where # should
correspond to the tomogram number the clicker corresponds to. This number
will be used to fill in the 7th field in the output motive list
output_motl_fn
.
Points are added with roughly a pixel distance spacing
apart. These points
are also set with Euler angles that place them normal to the surface of
the sphere or tube on which they lie. Points take the form of a tube is
do_tubule
evaluates to true as a boolean otherwise the clickers are
assumed to correspond to spheres. In the case of both the radius is
encoded in the 3rd field of the clicker motive and carried over to the
output motive list. The second field corresponds to the marker set the
clicker file was created from, which is not used in placing spheres but is
considered in seeding tubules to delineate between multiple tubules in
each tomogram. Finally a running index of tube or sphere is added to the
6th field of the output motive list. If both do_tubule
and rand_inplane
evaluate to true as a boolean, then the final Euler angle (phi in AV3 notation,
and psi/spin/inplane in other notations) will be randomized as opposed to
directed along the tubular axis.
Example
subtom_seed_positions(...
'input_motl_fn_prefix', '../startset/clicker', ...
'output_motl_fn', 'combinedmotl/allmotl_1.em', ...
'spaciing', 4, ...
'do_tubule', 0, ...
'rand_inplane', 0)
See Also
subtom_shape
Produces a volume of a simple shape for masking.
subtom_shape(
'shape', shape (''),
'box_size', box_size (''),
'radius', radius (box_size / 2),
'outer_radius', outer_radius (box_size / 2),
'inner_radius', inner_radius (outer_radius - 2),
'radius_x', radius_x (box_size / 2),
'radius_y', radius_y (box_size / 2),
'radius_z', radius_z (box_size / 2),
'outer_radius_x', outer_radius_x (box_size / 2),
'outer_radius_y', outer_radius_y (box_size / 2),
'outer_radius_z', outer_radius_z (box_size / 2),
'inner_radius_x', inner_radius_x (outer_radius_x - 2),
'inner_radius_y', inner_radius_y (outer_radius_y - 2),
'inner_radius_z', inner_radius_z (outer_radius_z - 2),
'length_x', length_x (box_size),
'length_y', length_y (box_size),
'length_z', length_z (box_size),
'height', height (box_size),
'center_x', center_x (floor(box_size / 2) + 1),
'center_y', center_y (floor(box_size / 2) + 1),
'center_z', center_z (floor(box_size / 2) + 1),
'shift_x', shift_x (0),
'shift_y', shift_y (0),
'shift_z', shift_z (0),
'rotate_phi', rotate_phi (0),
'rotate_psi', rotate_psi (0),
'rotate_theta', rotate_theta (0),
'sigma', sigma (0),
'ref_fn', ref_fn (''),
'output_fn', output_fn (''))
Creates a volume of a simple shape, with the volume being a cube of box_size
length, and writes out the volume as output_fn
. This volume is generally
used for masking. The shape in the volume is defined by shape
and can be
one of several strings, the available shapes are ‘sphere’, ‘sphere_shell’,
‘ellipsoid’, ‘ellipsoid_shell’, ‘cylinder’, ‘tube’, ‘elliptic_cylinder’,
‘elliptic_tube’, and ‘cuboid’. For each shape there are a number of options
available to define the shape.
For each shape an optional gaussian smooth edge can be added by defining
sigma
.
For each shape an optional transform can also be applied to the shape by
specifying a shift through the options shift_x
, shift_y
, and
shift_z
, and the shapes initial center can be specified by center_x
,
center_y
, center_z
. Rotations to the shape are applied through the
options rotate_phi
, rotate_psi
, and rotate_theta
. Rotations are done
about the center and shifts are applied after any given rotation.
Finally another volume can be given by passing the option ref_fn
and the
shape will be applied to the volume, which can aid in testing how the shape
masks the underlying density.
If shape is ‘sphere’, the shape is defined by radius
.
If shape is ‘sphere_shell’, the shape is defined by inner_radius
and
outer_radius
.
If shape is ‘ellipsoid’, the shape is defined by radius_x
, radius_y
, and
radius_z
.
If shape is ‘ellipsoid_shell’, the shape is defined by inner_radius_x
,
inner_radius_y
, inner_radius_z
, outer_radius_x
, outer_radius_y
,
and outer_radius_z
.
If shape is ‘cylinder’, the shape is defined by radius
and height
.
If shape is ‘tube’, the shape is defined by inner_radius
, outer_radius
,
and height
.
If shape is ‘elliptic_cylinder’, the shape is defined by radius_x
,
radius_y
, and height
.
If shape is ‘elliptic_tube’, the shape is defined by inner_radius_x
,
inner_radius_y
, outer_radius_x
, outer_radius_y
, and height
.
Finally if shape is ‘cuboid’, the shape is defined by length_x
,
length_y
, and length_z
.
Example
subtom_shape(...
'shape', 'sphere', ...
'box_size', 128, ...
'radius', 32, ...
'sigma', 3, ...
'output_fn', 'otherinputs/mask.em');
See Also
subtom_split_motl_by_row
Split a MOTL file by a given row.
subtom_split_motl_by_row(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn_prefix', output_motl_fn_prefix (''),
'split_row', split_row (7))
Takes the MOTL file specified by input_motl_fn
and writes out a seperate
MOTL file with output_motl_fn_prfx
as the prefix where each output file
corresponds to a unique value of the row split_row
in input_motl_fn
.
Example
subtom_split_motl_by_row(...
'input_motl_fn', 'combinedmotl/allmotl_1.em', ...
'output_motl_fn', 'combinedmotl/allmotl_1_tomo', ...
'split_row', 7)
See Also
subtom_transform_motl
Apply a rotation and a shift to a MOTL file.
subtom_transform_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'shift_x', shift_x (0),
'shift_y', shift_y (0),
'shift_z', shift_z (0),
'rotate_phi', rotate_phi (0),
'rotate_psi', rotate_psi (0),
'rotate_theta', rotate_theta (0),
'rand_inplane', rand_inplane (0))
Takes the motl given by input_motl_fn
, and first applies the rotation
described by the Euler angles rotate_phi
, rotate_psi
, rotate_theta
,
which correspond to an in-plane spin, azimuthal, and zenithal rotation
respectively. Then a translation specified by shift_x
, shift_y
,
shift_z
, is applied to the existing translation. Finally the resulting
transformed motive list is written out as output_motl_fn
. Keep in mind that
the motive list transforms describe the alignment of the reference to each
particle, but that the rotation and shift here describe an affine transform of
the reference to a new reference. If rand_inplane
evaluates to true as a
boolean, then the final Euler angle (phi in AV3 notation, and psi/spin/inplane
in other notations) will be randomized after the given transform.
Explanation of how the transforms are derived
The alignments in the motive list describe the rotation and shift of the
reference to each particle. Since this is a rotation followed by a shift
we can describe this as an affine transform 4x4 matrix as follows:
[ R_1 T_1 ] [ V ] [ P ]
[ ] x [ ] = [ ] (1)
[ 0 1 ] [ 1 ] [ 1 ]
Where R_1 is the rotation matrix described by motl(17:19, 1) and T_1 is
the shift column vector described by motl(11:13, 1), and finally V and P
are the coordinates in the reference, and the reference in register with
the particle respectively.
Likewise the rotation and shift we apply to the reference to get a new
updated reference is also an affine transform as follows:
[ R_2 T_2 ] [ V ] [ V' ]
[ ] x [ ] = [ ] (2)
[ 0 1 ] [ 1 ] [ 1 ]
Where V' is our new reference. Therefore the affine transform we want to
find and place in our updates motive list is:
[ R_? T_? ] [ V' ] [ P ]
[ ] x [ ] = [ ] (3)
[ 0 1 ] [ 1 ] [ 1 ]
The most logical path is to go from V' to V and V to P, so we have to
invert the affine transform in (2), and then left multiply it by the
transform in (1). To find the inverse affine transform of (2) we have
that:
R_2 * V + T_2 = V'
R_2 * V = V' - T_2
V = R_2^-1 * (V' - T_2)
V = (R_2^-1 * V') - (R_2^-1 * T_2)
[ R_2^-1 -R_2^-1 * T_2 ] [ V' ] [ V ]
[ ] x [ ] = [ ] (4)
[ 0 1 ] [ 1 ] [ 1 ]
So we have that:
[ R_1 T_1 ] [ R_2^-1 -R_2^-1 * T_2 ] [ V' ] [ P ]
[ ] x [ ] x [ ] = [ ] (5)
[ 0 1 ] [ 0 1 ] [ 1 ] [ 1 ]
[ R_1 * R_2^-1 -R_1 * R_2^-1 * T_2 + T_1 ] [ V' ] = [ P ]
[ ] x [ ] = [ ] (6)
[ 0 1 ] [ 1 ] = [ 1 ]
And finally:
R_? = R_1 * R_2^-1
T_? = T_1 - R_1 * R_2^-1 * T_2
Example
subtom_transform_motl(...
'input_motl_fn', 'combinedmotl/allmotl_1.em', ...
'output_motl_fn', 'combinedmotl/allmotl_1_shifted.em', ...
'shift_x', 5, ...
'shift_y', 5, ...
'shift_z', -3, ...
'rotate_phi', 60, ...
'rotate_psi', 15, ...
'rotate_theta', 0.5, ...
'rand_inplane', 0)
See Also
subtom_weighted_average
Joins and weights parallel average subsets.
subtom_weighted_average(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'weight_sum_fn_prefix', weight_sum_fn_prefix ('otherinputs/wei'),
'iteration', iteration (1),
'iclass', iclass (0),
'num_avg_batch', num_avg_batch (1))
Takes the num_avg_batch
parallel sum subsets with the name prefix
ref_fn_prefix
, the all_motl file with name prefix motl_fn_prefix
and
weight volume subsets with the name prefix weight_sum_fn_prefix
to generate
the final average, which should then be used as the reference for iteration
number iteration
. iclass
describes which class outside of one is
included in the final average and is used to correctly scale the average and
weights.
Example
subtom_weighted_average(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', './ref/ref', ...
'weight_sum_fn_prefix', 'otherinputs/wei', ...
'iteration', 1, ...
'iclass', 0, ...
'num_avg_batch', 1)
See Also
subtom_unclass_motl
Removes the iclass information from a motive list.
subtom_unclass_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''))
Takes the motive list given by input_motl_fn
, and removes all of the
iclass information setting all of the particles to 1, this can be useful
when moving from classified motive lists to all-particle alignments. Then
unclassed motive list is written out as output_motl_fn
.
Example
subtom_unclass_motl(...
'input_motl_fn', 'combinedmotl/allmotl_wmd_2.em', ...
'output_motl_fn', 'combinedmotl/allmotl_wmd_unclassed_2.em')
See Also
subTOM: General Classification Utilities
Classification in subTOM
There are several classification strategies within subTOM. Three are variants of
Multivariate Statistical Analysis (MSA), and the final is Multireference
alignment. Within all of these there are some functions which are generally
applicable and so they are shared between each modus of classification. This
includes the clustering of MSA data (subtom_cluster
), as well as the
averaging of data that has been classified (subtom_parallel_sums_cls
and
subtom_weighted_average_cls
). Since classification, particularly prinicipal
component analysis (PCA), is very operation intensive there is also a function
to pre-align all extracted particles and write them to disk to speed up later
processing steps (subtom_parallel_prealign
).
subtom_align_refs
Aligns the class averages from a given MOTL file and then applies the found rotations to class particles and re-averages the classes and all particle average in parallel on the cluster or locally.
This subtomogram alignment and averaging script uses five MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- align_exec
Alignment executable
- sum_exec
Parallel Summing executable
- avg_exec
Weighted Averaging executable
- sum_cls_exec
Class Average Parallel Summing executable
- avg_cls_exec
Class Average Final Averaging executable
- motl_dump_exec
MOTL dump executable
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Subtomogram Averaging Workflow Options
Parallelization Options
- iteration
The index of the reference to generate : input will be all_motl_fn_prefix_iteration.em (define as integer)
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
File Options
- all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. all_motl_fn_prefix=’sub-directory/allmotl’)
- output_motl_fn_prefix
Relative path and name prefix of the output motivelist of all particles. There will be two versions written out. The first with “_classed_” will retain the iclass values to generate new aligned class averages. The second with “_unclassed_” will set all particles iclass to 1 to generate a cumulative class average.
- ref_fn_prefix
Relative path and name of the reference volumes (e.g. ref_iter.em , the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’)
- align_mask_fn
Relative path and name of the alignment mask. If “none” is given a default spherical mask will be used. (e.g. align_mask_fn=’otherinputs/align_mask.em’)
- cc_mask_fn
Relative path and name of the cross-correlation mask this defines the maximum shifts in each direction. If “noshift” is given no shifts are allowed. (e.g. cc_mask_fn=’otherinputs/cc_mask_1.em’)
- weight_fn_prefix
Relative path and name of the weight file.
- weight_sum_fn_prefix
Relative path and name of the partial weight files.
Alignment and Averaging Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- ref_class
Which class average to align the other class averages against. Because of the AV3 specification for iclass this should be a number that is 3 or above.
- apply_mask
Apply mask to class averages (1=yes, 0=no)
- psi_angle_step
Angular increment in degrees, applied during the cone-search, i.e. psi and theta (define as real e.g. psi_angle_step=3)
- psi_angle_shells
Number of angular iterations, applied to psi and theta (define as integer e.g. psi_angle_shells=3)
- phi_angle_step
Angular increment for phi in degrees, (define as real e.g. phi_angle_step=3)
- phi_angle_shells
Number of angular iterations for phi, (define as integer e.g. phi_angle_shells=3)
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as (box_size*pixelsize)/(resolution_real) (define as integer e.g. high_pass_fp=2)
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above.
- low_pass_fp
Low pass filter (in transform units (pixels): calculate as (box_size*pixelsize)/(resolution_real) (define as integer e.g. low_pass_fp=30).
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above.
- nfold
Symmetry, if no symmetry nfold=1 (define as integer e.g. nfold=3)
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
sum_exec="${exec_dir}/alignment/subtom_parallel_sums"
avg_exec="${exec_dir}/alignment/subtom_weighted_average"
sum_exec="${exec_dir}/classification/general/subtom_parallel_sums_cls"
avg_exec="${exec_dir}/classification/general/subtom_weighted_average_cls"
mem_free=1G
mem_max=64G
job_name=subTOM
array_max=1000
max_jobs=4000
run_local=0
skip_local_copy=1
iteration=1
num_avg_batch=1
all_motl_fn_prefix="combinedmotl/allmotl"
output_motl_fn_prefix="combinedmotl/allmotl"
ref_fn_prefix="ref/ref"
ptcl_fn_prefix="subtomograms/subtomo"
align_mask_fn="otherinputs/align_mask.em"
cc_mask_fn="otherinputs/cc_mask.em"
weight_fn_prefix="otherinputs/ampspec"
weight_sum_fn_prefix="otherinputs/wei"
tomo_row=7
ref_class=3
apply_mask=0
psi_angle_step=1
psi_angle_step=6
phi_angle_step=1
phi_angle_shells=10
high_pass_fp=1
high_pass_sigma=2
low_pass_fp=10
low_pass_sigma=3
nfold=1
subtom_cluster
Clusters a motive-list using pre-calculated and supplied coefficients and outputs a classified motive-list and class averages.
This subtomogram classification script uses three MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- cluster_exec
Cluster executable.
- sum_exec
Parallel Summing executable
- avg_exec
Final Averaging executable
Memory Options
- mem_free
The amount of memory the job requires. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Parallelization Options
- iteration
The index of the references to generate : input will be all_motl_fn_prefix_iteration.em (define as integer e.g. iteration=1)
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
Subtomogram Classification Workflow Options
Coefficient File Options
- coeff_all_motl_fn_prefix
Relative path and name of the concatenated motivelist to cluster and classify.
- coeff_fn_prefix
Relative path and name of the coefficients.
Clustering Options
- cluster_type
The following determines which algorithm will be used to cluster the determined Eigencoefficients. The valid options are K-means clustering, ‘kmeans’, Hierarchical Ascendent Clustering using a Ward Criterion, ‘hac’, and a Gaussian Mixture Model, ‘gaussmix’.
- coeff_idxs
Determines which coefficients are used to cluster. The format should be a semicolon-separated list that also supports ranges with a dash (-), for example 1-5;7;15-19 would select the first five coefficients, the seventh and the fifteenth through the nineteenth for classification. If it is left as “all” all coefficients will be used.
- num_classes
How many classes should the particles be clustered into.
Clustering File Options
- cluster_all_motl_fn_prefix
Relative path and name of the concatenated motivelist of the output classified particles.
Averaging File Options
- ref_fn_prefix
Relative path and name prefix of the reference volumes (e.g. ref_iter.em, the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- weight_sum_fn_prefix
Relative path and name prefix of the partial weight files.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="XXXINSTALLATION_DIRXXX/bin"
cluster_exec="${exec_dir}/classification/pca/subtom_cluster"
sum_exec="${exec_dir}/classification/pca/subtom_parallel_sums"
avg_exec="${exec_dir}/classification/pca/subtom_weighted_average"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
array_max="1000"
max_jobs="4000"
run_local="0"
skip_local_copy="1"
iteration="1"
num_avg_batch="1"
coeff_all_motl_fn_prefix="combinedmotl/allmotl"
coeff_fn_prefix="class/coeffs"
cluster_type="kmeans"
eig_idxs="all"
num_classes=2
cluster_all_motl_fn_prefix="class/allmotl_class"
ref_fn_prefix="class/ref"
weight_sum_fn_prefix="class/wei"
subtom_multiref_average
Calculates the average from a given MOTL file in parallel on the cluster or locally.
This subtomogram averaging script uses two MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- sum_exec
Parallel Summing executable
- avg_exec
Weighted Averaging executable
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Subtomogram Averaging Workflow Options
Parallelization Options
- iteration
The index of the reference to generate : input will be all_motl_fn_prefix_iteration.em (define as integer)
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
File Options
- all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. all_motl_fn_prefix=’sub-directory/allmotl’)
- ref_fn_prefix
Relative path and name of the reference volumes (e.g. ref_iter.em , the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’)
- weight_fn_prefix
Relative path and name of the weight file.
- weight_sum_fn_prefix
Relative path and name of the partial weight files.
Averaging Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
sum_exec="${exec_dir}/classification/general/subtom_parallel_sums_cls"
avg_exec="${exec_dir}/classification/general/subtom_weighted_average_cls"
mem_free=1G
mem_max=64G
job_name=subTOM
array_max=1000
max_jobs=4000
run_local=0
skip_local_copy=1
iteration=1
num_avg_batch=1
all_motl_fn_prefix="combinedmotl/allmotl"
ref_fn_prefix="ref/ref"
ptcl_fn_prefix="subtomograms/subtomo"
weight_fn_prefix="otherinputs/ampspec"
weight_sum_fn_prefix="otherinputs/wei"
tomo_row=7
subtom_cluster
Classifies particles based on given coefficients.
subtom_cluster(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'coeff_fn_prefix', coeff_fn_prefix ('class/coeff'),
'output_motl_fn_prefix', output_motl_fn_prefix ('class/allmotl'),
'iteration', iteration (1),
'cluster_type', cluster_type ('kmeans'),
'eig_idxs', eig_idxs ('all'),
'num_classes', num_classes ('2'))
Takes the motive list given by all_motl_fn_prefix
and the coefficients
specified by coeff_fn_prefix
for the iteration iteration
and clusters
the data based on the coefficients. Clustering can be done using one of three
methods, which are specfied by cluster_type
. The options are K-Means
clustering with ‘kmeans’, Hierarchical Ascendant Clustering with ‘hac’ and a
Gaussian Mixture Model with ‘gaussmix’. A subset of coefficients can be selected
and are given as a semicolon-separated string of indices as coeff_idxs
. The
string can also contain ranges delimited by a dash, for example ‘1;3;5-10’. The
data will be clustered into num_classes
number of clusters and the clustered
motive list will be written out to a file given by output_motl_fn_prefix
.
Example
subtom_cluster(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'coeff_fn_prefix', 'class/eigcoeff_msa', ...
'output_motl_fn_prefix', 'class/allmotl_msa', ...
'iteration', 1, ...
'cluster_type', 'hac', ...
'coeff_idxs', '2-5;7;9-20', ...
'num_classes', '20')
See Also
subtom_parallel_prealign
Prealigns particles to speed up CC-Matrix calculation.
subtom_parallel_prealign(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'prealign_fn_prefix', prealign_fn_prefix ('subtomograms/subtomo'),
'iteration', iteration (1),
'num_prealign_batch', num_prealign_batch (1),
'process_idx', process_idx (1))
Prerotates and translates particles into alignment as precalculation on disk to
speed up the calculation of the constrained cross-correlation matrix. The
alignments are given in the motive list specified by all_motl_fn_prefix
and
iteration
, and the particles are based on ptcl_fn_prefix
and # where #
is described in row 4 of the motive list. Pre-aligned particles will be written
out described by prealign_fn_prefix
, iteration
and #. The process is
designed to be run in parallel on a cluster. The particles will be processed in
num_prealign_batch
chunks, with the specific chunk being processed described
by process_idx
.
Example
subtom_parallel_prealign(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'prealign_fn_prefix', 'subtomograms/subtomo_ali', ...
'iteration', 1, ...
'num_prealign_batch', 100, ...
'process_idx', 1)
See Also
subtom_parallel_sums_cls
Creates raw sums and Fourier weight sums in a batch.
subtom_parallel_sums_cls(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'weight_sum_fn_prefix, weight_sum_fn_prefix ('otherinputs/wei'),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'num_avg_batch', num_avg_batch (1),
'process_idx', process_idx (1))
Aligns a subset of particles using the rotations and shifts in
all_motl_fn_prefix
_#.em where # corresponds to iteration
in
num_avg_batch
chunks to make a raw particle sum ref_fn_prefix
_#_###.em
where # corresponds to iteration
and ### corresponds to process_idx
.
Fourier weight volumes with name prefix weight_fn_prefix
will also be
aligned and summed to make a weight sum weight_sum_fn_prefix
_#_###.em.
tomo_row
describes which row of the motl file is used to determine the
correct tomogram fourier weight file. In this multi-reference version of
parallel sums, each unique value of iclass (row 20 in the motive list) will be
summed and written out (excluding non-positive values).
Example
subtom_parallel_sums_cls(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', 'ref/ref', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'weight_sum_fn_prefix, 'otherinputs/wei', ...
'iteration', 1, ...
'tomo_row', 7, ...
'num_avg_batch', 1, ...
'process_idx', 1)
See Also
subtom_scan_angles_exact_refs
Align a particle class averages to a single class average reference.
subtom_scan_angles_exact_refs(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'align_mask_fn', align_mask_fn ('none'),
'cc_mask_fn', cc_mask_fn ('noshift'),
'apply_mask', apply_mask (0),
'ref_class', ref_class (3),
'psi_angle_step', psi_angle_step (0),
'psi_angle_shells', psi_angle_shells (0),
'phi_angle_step', phi_angle_step (0),
'phi_angle_shells', phi_angle_shells (0),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'iteration', iteration (1))
Aligns class averages from the collective motive list with the name format
all_motl_fn_prefix
_#.em where # is the number iteration
. A motive list
for the best determined alignment parameters against the class average specified
by ref_class
is written out in two motive lists as given by
output_motl_fn_prefix
. The first with ‘classed’ keeps the class information
to generate new class averages. The second with ‘unclassed’ discards the class
information so a cumulative average can be generated.
Class averages, with the name format ref_fn_prefx
_class_#_#.em where the
first # is the iclass number, and the the second # is iteration
, are aligned
against the reference class average. Before the comparison is made a number of
alterations are made to both the class average and reference:
If
nfold
is greater than 1 then C#-symmetry is applied along the Z-axis to the reference where # isnfold
.The reference is masked in real space with the mask
align_mask_fn
, and ifapply_mask
evaluates to true as a boolean, then this mask is also applied to the class average. A sphere mask is applied to the particle to reduces the artifacts caused by the box-edges on the comparison. This sphere has a diameter that is 80% the box size and falls of with a sigma that is 15% half the box size.
apply_mask
can help alignment and suppress alignment to other features when the particle is well-centered or already reasonably well aligned, but if this is not the case there is the risk that a tight alignment will cutoff parts of the particle.Both the particle and the reference are bandpass filtered in the Fourier domain defined by
high_pass_fp
,high_pass_sigma
,low_pass_fp
, andlow_pass_sigma
which are all in the units of Fourier pixels.
The local rotations searched during alignment are deteremined by the four
parameters psi_angle_step
, psi_angle_shells
, phi_angle_step
, and
phi_angle_shells
. They describe a search where the currently existing
alignment parameters for azimuth and zenith are used to define a “pole” to
search about in the ceiling of half psi_angle_shells
cones. The change in
zenith between each cone is psi_angle_step
and the azimuth around the cone
is close to the same angle but is adjusted slightly to account for bias near the
pole. The final spin angle of the search is done with a change in spin of
phi_angle_step
in phi_angle_shells
steps. The spin is applied in both
clockwise and counter-clockwise fashion.
The angles phi, and psi here are flipped in their sense of every other package for EM image processing, which is absolutely infuriating and confusing, but maintained for historical reasons, however most descriptions use the words azimuth, zenith, and spin to avoid ambiguity.
Finally after the constrained cross-correlation function is calculated it is
masked with cc_mask_fn
to limit the shifts to inside this volume, and a peak
is found and it’s location is determined to sub-pixel accuracy using
interpolation. The rotations and shifts that gives the highest cross-correlation
coefficient are then chosen as the new alignments parameters.
Example
subtom_scan_angles(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', 'ref/ref', ...
'align_mask_fn', 'otherinputs/align_mask.em', ...
'cc_mask_fn', 'otherinputs/cc_mask.em', ...
'apply_mask', 1, ...
'ref_class', 3, ...
'psi_angle_step', 1, ...
'psi_angle_shells', 8, ...
'phi_angle_step', 1, ...
'phi_angle_shells', 8, ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 10, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'iteration', 1)
See Also
subtom_weighted_average_cls
Joins and weights parallel average subsets.
subtom_weighted_average_cls(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'weight_sum_fn_prefix', weight_sum_fn_prefix ('otherinputs/wei'),
'iteration', iteration (1),
'num_avg_batch', num_avg_batch (1))
Takes the num_avg_batch
parallel sum subsets with the name prefix
ref_fn_prefix
, the all_motl file with name prefix motl_fn_prefix
and
weight volume subsets with the name prefix weight_sum_fn_prefix
to generate
the final average, which should then be used as the reference for iteration
number iteration
.
Example
subtom_weighted_average_cls(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', './ref/ref', ...
'weight_sum_fn_prefix', 'otherinputs/wei', ...
'iteration', 1, ...
'num_avg_batch', 1)
See Also
Indices and tables
subTOM: Multivariate Statistical Analysis
Multivariate Statistical Analysis Classification
In Multivariate Statistical Analysis (MSA) classification the full set of particles are simplified into a new lower-dimensional representation by means of EigenValue decomposition. Particles projected onto these most variable basis-vectors then can be clustered using a variety of methods.
Within subTOM particles are first compiled into a 2-D Matrix denoted here as the
X-Matrix, which holds the aligned, band-pass filtered and masked particle data.
To speed up calculation particles can be pre-aligned using the function
subtom_parallel_prealign
. Batches of the X-Matrix are calculated in parallel
with subtom_parallel_xmatrix_msa
and then combined and column-centered with
subtom_join_xmatrix
.
Next the X-Matrix is used to calculated the covariance matrix which is scaled
using the so-called ‘modulation metric’ as described in L. Borland and M. van
Heel in J. Opt. Soc. Am. A 1990, which is similar to the Chi-Square metrics used
in Correspondance Analysis of ordinal data. This covariance matrix is then
decomposed into it’s Eigenvectors and Eigenvalues and these are used along with
the X-Matrix to determine the Eigenvolumes of the dataset with
subtom_eigenvolumes_msa
.
These volumes are then used to determine the low-rank approximation coefficients
in volume space for clustering. A larger particle superset can be projected onto
the volumes to speed up classification of large datasets. Coefficients are also
calculated in parallel in batches with subtom_parallel_eigcoeffs_msa
and
joined with subtom_join_eigencoeffs_msa
.
Finally using a user-selected subset of the determined coefficients, the data is
clustered either by Hierarchical Ascendant Clustering using a Ward distance
criterion, K-Means clustering, or a Gaussian Mixture model with the function
subtom_cluster
. This clustering is then used to generate the final class
averages.
subtom_msa
The main MSA pipeline process script of subTOM.
This subtomogram classification script uses nine MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- cluster_exec
Cluster executable.
- par_eigcoeff_exec
Parallel Eigencoefficient executable.
- eigcoeff_exec
Final Eigencoefficient executable.
- eigvol_exec
Eigenvolume Calculation executable.
- preali_exec
Parallel Subtomogram prealign executable.
- par_xmatrix_exec
Parallel X-Matrix executable.
- xmatrix_exec
Final X-Matrix executable.
- sum_exec
Parallel Summing executable
- avg_exec
Final Averaging executable
- motl_dump_exec
MOTL dump executable
Memory Options
- mem_free
The amount of memory the job requires. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Parallelization Options
- iteration
The index of the references to generate : input will be all_motl_fn_prefix_iteration.em (define as integer e.g. iteration=1)
- num_xmatrix_prealign_batch
Number of batches to split the parallel particle prealignment for the X-Matrix calculation into. If you are not doing prealignment you can ignore this option.
- num_xmatrix_batch
Number of batches to split the parallel X-Matrix calculation job into.
- num_eig_coeff_prealign_batch
Number of batches to split the parallel particle prealignment for the Eigencoefficients calculations into. If you are not doing prealignment you can ignore this option.
- num_eig_coeff_batch
Number of batches to split the parallel Eigencoefficient calculation into.
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
Subtomogram Classification Workflow Options
X-Matrix Options
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer).
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above.
- low_pass_fp
Low pass filter (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer).
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above.
- nfold
Symmetry to apply to each pair of particle and reference in X-Matrix calculation, if no symmetry nfold=1 (define as integer e.g. nfold=3).
- xmatrix_prealign
If you want to pre-align all of the particles to speed up the X-Matrix calculation, set the following to 1, otherwise the particles will be aligned during the computation.
X-Matrix File Options
- xmatrix_all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. xmatrix_all_motl_fn_prefix=’sub-directory/allmotl’).
- xmatrix_fn_prefix
Relative path and name of the X-Matrix.
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’).
- mask_fn
Relative path and name of the classification mask. This should be a binary mask as correlations are done in real-space, and calculations will only be done using voxels passed by the mask, so smaller masks will run faster. If you want to use the default spherical mask set mask_fn to ‘none’.
Eigenvolumes Options
- num_eigs
The number of Eigenvectors and Eigenvalues to calculate.
- eigs_iterations
The following allows you to adjust the number of iterations to use in the decomposition. If you want to use the default number of iterations leave this set to ‘default’.
- eigs_tolerance
The following allows you to adjust the convergence tolerance of the decomposition calculation. If you want to use the default tolerance leave this set to ‘default’.
Eigenvolumes File Options
- eig_vec_fn_prefix
Relative path and name of the Eigenvectors.
- eig_val_fn_prefix
Relative path and name of the Eigenvalues.
- eig_vol_fn_prefix
Relative path and name of the Eigenvolumes.
Eigencoefficient Options
- apply_weight
If the following is set to 1, the Eigenvolume will have the particles missing-wedge weight applied to it before the correlation is calculated.
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- eig_coeff_prealign
If you want to pre-align all of the particles to speed up the Eigencoefficient calculation, set the following to 1, otherwise the particles will be aligned during the computation.
Eigencoefficient File Options
- eig_coeff_all_motl_fn_prefix
Relative path and name of the concatenated motivelist to project onto the Eigenvolumes. This can be a larger motivelist than the one used to calculate the X-Matrix and Eigenvolumes.
- eig_coeff_fn_prefix
Relative path and name of the Eigencoefficients.
- weight_fn_prefix
Relative path and name of the weight file, if you are not applying the weight to the Eigenvolumes this can be left alone.
Clustering Options
- cluster_type
The following determines which algorithm will be used to cluster the determined Eigencoefficients. The valid options are K-means clustering, ‘kmeans’, Hierarchical Ascendent Clustering using a Ward Criterion, ‘hac’, and a Gaussian Mixture Model, ‘gaussmix’.
- eig_idxs
Determines which Eigencoefficients are used to cluster. The format should be a semicolon-separated list that also supports ranges with a dash (-), for example 1-5;7;15-19 would select the first five Eigencoefficients, the seventh and the fifteenth through the nineteenth for classification. If it is left as “all” all coefficients will be used.
- num_classes
How many classes should the particles be clustered into.
Clustering File Options
- cluster_all_motl_fn_prefix
Relative path and name of the concatenated motivelist of the output classified particles.
Averaging File Options
- ref_fn_prefix
Relative path and name prefix of the reference volumes (e.g. ref_iter.em, the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- weight_sum_fn_prefix
Relative path and name prefix of the partial weight files.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="XXXINSTALLATION_DIRXXX/bin"
cluster_exec="${exec_dir}/classification/general/subtom_cluster"
par_eigcoeff_exec="${exec_dir}/classification/msa/subtom_parallel_eigencoeffs_msa"
eigcoeff_exec="${exec_dir}/classification/msa/subtom_join_eigencoeffs_msa"
eigvol_exec="${exec_dir}/classification/msa/subtom_eigenvolumes_msa"
preali_exec="${exec_dir}/classification/general/subtom_parallel_prealign"
par_xmatrix_exec="${exec_dir}/classification/msa/subtom_parallel_xmatrix_msa"
xmatrix_exec="${exec_dir}/classification/msa/subtom_join_xmatrix"
sum_exec="${exec_dir}/classification/general/subtom_parallel_sums_cls"
avg_exec="${exec_dir}/classification/general/subtom_weighted_average_cls"
motl_dump_exec="${exec_dir}/MOTL/motl_dump"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
array_max="1000"
max_jobs="4000"
run_local="0"
skip_local_copy="1"
iteration="1"
num_xmatrix_prealign_batch="1"
num_xmatrix_batch="1"
num_eig_coeff_prealign_batch="1"
num_eig_coeff_batch="1"
num_avg_batch="1"
high_pass_fp="1"
high_pass_sigma="2"
low_pass_fp="12"
low_pass_sigma="3"
nfold="1"
xmatrix_prealign=0
xmatrix_all_motl_fn_prefix="combinedmotl/allmotl"
xmatrix_fn_prefix="class/xmatrix_msa"
ptcl_fn_prefix="subtomograms/subtomo"
mask_fn="none"
num_eigs='40'
eigs_iterations='default'
eigs_tolerance='default'
eig_vec_fn_prefix="class/eigvec_msa"
eig_val_fn_prefix="class/eigval_msa"
eig_vol_fn_prefix="class/eigvol_msa"
apply_weight="0"
tomo_row="7"
eig_coeff_prealign="0"
eig_coeff_all_motl_fn_prefix="combinedmotl/allmotl"
eig_coeff_fn_prefix="class/eigcoeff_msa"
weight_fn_prefix="otherinputs/ampspec"
cluster_type="kmeans"
eig_idxs="all"
num_classes=2
cluster_all_motl_fn_prefix="class/allmotl_msa"
ref_fn_prefix="class/ref_msa"
weight_sum_fn_prefix="class/wei_msa"
subtom_eigenvolumes_msa
Computes Eigendecomposition of X-Matrix covariance and projects data on Eigenvectors.
subtom_eigenvolumes_msa(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'eig_vec_fn_prefix', eig_vec_fn_prefix ('class/eigvec_msa'),
'eig_val_fn_prefix', eig_val_fn_prefix ('class/eigval_msa'),
'xmatrix_fn_prefix', xmatrix_fn_prefix ('class/xmatrix_msa'),
'eig_vol_fn_prefix', eig_vol_fn_prefix ('class/eigvol_msa'),
'mask_fn', mask_fn ('none'),
'iteration', iteration (1),
'num_eigs', num_eigs (40),
'eigs_iterations', eigs_iterations ('default'),
'eigs_tolerance', eig_tolerance ('default'))
Calculates num_eigs
weighted projections of particles onto the same number
of determined Eigenvectors, by means of a previously calculated X-matrix,
named as given by xmatrix_fn_prefix
and iteration
to produce Eigenvolumes which can
then be used to determine which vectors can best influence classification.
The Eigenvectors and Eigenvalues are also written out as specified by
eig_vec_fn_prefix
, eig_val_fn_prefix
, and iteration
The
Eigenvolumes are also masked by the file specified by mask_fn
. The output
weighted Eigenvolume will be written out as described by
eig_vol_fn_prefix
, iteration
and #, where the # is the particular
Eigenvolume being written out. Two options eigs_iterations
and
eigs_tolerance
are also available to tune how eigs is run. If the string
‘default’ is given for either the default values in eigs will be used.
Example
subtom_eigenvolumes_msa(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'eig_vec_fn_prefix', 'class/eigvec', ...
'eig_val_fn_prefix', 'class/eigval', ...
'xmatrix_fn_prefix', 'class/xmatrix', ...
'eig_vol_fn_prefix', 'class/eigvol', ...
'mask_fn', 'class/class_mask.em', ...
'iteration', 1, ...
'num_eigs', 40, ...
'eigs_iterations', 'default', ...
'eigs_tolerance', 'default')
See Also
subtom_join_eigencoeffs_msa
Combines Eigencoefficient batches into final matrix.
subtom_join_eigencoeffs_msa(
'eig_coeff_fn_prefix', eig_coeff_fn_prefix ('class/eigcoeff_msa'),
'iteration', iteration (1),
'num_coeff_batch', num_coeff_batch (1))
Looks for partial chunks of the low-rank approximation coefficients of projected
particles with the file name given by eig_coeff_fn_prefix
, iteration
and
# where # is from 1 to num_coeff_batch
, and combines them into a final
matrix of coefficients written out as described by eig_coeff_fn_prefix
, and
iteration
.
Example
subtom_join_eigencoeffs_msa(...
'eig_coeff_fn_prefix', 'class/eigcoeff', ...
'iteration', 1, ...
'num_coeff_batch', 100)
See Also
subtom_join_xmatrix
Combines chunks of X-Matrix into the final matrix.
subtom_join_xmatrix(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'xmatrix_fn_prefix', xmatrix_fn_prefix ('class/xmatrix_msa'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'mask_fn', mask_fn ('none'),
'iteration', iteration (1),
'num_xmatrix_batch', num_xmatrix_batch (1))
Looks for partial chunks of the X-matrix with the file name given by
xmatrix_fn_prefix
, iteration
, and # where # is from 1 to
num_xmatrix_batch
, and combines them into a final matrix of coefficients
written out as described by xmatrix_fn_prefix
and iteration
.
Example
subtom_join_xmatrix(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'xmatrix_fn_prefix', 'class/xmatrix', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'mask_fn', 'otherinputs/classification_mask.em', ...
'iteration', 1, ...
'num_xmatrix_batch', 100);
See Also
subtom_parallel_eigencoeffs_msa
Computes particle Eigencoefficients
subtom_parallel_eigencoeffs_msa(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'xmatrix_fn_prefix', xmatrix_fn_prefix ('class/xmatrix_msa'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'eig_coeff_fn_prefix', eig_coeff_fn_prefix ('class/eigcoeff_msa'),
'eig_val_fn_prefix', eig_val_fn_prefix ('class/eigval_msa'),
'eig_vol_fn_prefix', eig_vol_fn_prefix ('class/eigvol_msa'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'mask_fn', mask_fn ('none'),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'apply_weight', apply_weight (0),
'tomo_row', tomo_row (7),
'iteration', iteration (1),
'prealigned', prealigned (0),
'num_coeff_batch', num_coeff_batch (1),
'process_idx', process_idx (1))
Takes a batch subset of particles described by all_motl_fn_prefix
with
filenames given by ptcl_fn_prefix
, band-pass filters them as described by
high_pass_fp
, high_pass_sigma
, low_pass_fp
, and low_pass_sigma
,
and projects them onto the Eigenvolumes specified by eig_vol_fn_prefix
. This
determines a set of coefficients describing a low-rank approximation of the
data. A subset of this coefficient matrix is written out based on
eig_coeff_fn_prefix
and process_idx
, with there being
num_coeff_batch
batches in total.
If apply_weight
is set to 1 the Eigenvolumes will be reweighted using the
correct weight of each particle as described by weight_fn_prefix
and
tomo_row
, then each particle will be read and projected in a loop. If
prealigned
is set to 1, then it is understood that the particles have been
prealigned beforehand and the alignment of the particles can be skipped to save
time. mask_fn
describes the mask used throughout classification and ‘none’
describes a default spherical mask.
Example
subtom_parallel_eigencoeffs_msa(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'xmatrix_fn_prefix', 'class/xmatrix', ...
'ptcl_fn_prefix', 'subtomograms/subtomo_ali', ...
'eig_coeff_fn_prefix', 'class/eigcoeff', ...
'eig_val_fn_prefix', 'class/eigval', ...
'eig_vol_fn_prefix', 'class/eigvol', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'mask_fn', 'otherinputs/classification_mask.em', ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'apply_weight', 1, ...
'tomo_row', 7, ...
'iteration', 1, ...
'prealigned', 1, ...
'num_coeff_batch', 100, ...
'process_idx', 1)
See Also
subtom_parallel_xmatrix_msa
Calculates chunks of the X-matrix for processing.
subtom_parallel_xmatrix_msa(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'xmatrix_fn_prefix', xmatrix_fn_prefix ('class/xmatrix_msa'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'mask_fn', mask_fn ('none'),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'iteration', iteration (1),
'prealigned', prealigned (0),
'num_xmatrix_batch', num_xmatrix_batch (1),
'process_idx', process_idx (1))
Aligns a subset of particles using the rotations and shifts given by
all_motl_fn_prefix
and iteration
, band-pass filters the particle as
described by high_pass_fp
, high_pass_sigma
, low_pass_fp
, and
low_pass_sigma
, optionally symmetrizes the particle with C-fold symmetry
nfold
, and then places these voxels as a 1-D row vector in a data
sub-matrix which is historically known as the X-matrix (See Borland, Van Heel
1990 J. Opt. Soc. Am. A). This X-matrix can then be used to
speed up the calculation of Eigenvolumes and Eigencoefficients used for
classification. The subset of particles compared is specified by the number of
particles in the motive list and the number of requested batches specified by
num_xmatrix_batch
, with the specific subset deteremined by process_idx
.
The X-matrix chunk will be written out as specified by xmatrix_fn_prefix
,
iteration
and process_idx
. The location of the particles is specified by
ptcl_fn_prefix
. If prealigned
evaluates to true as a boolean then the
particles are assumed to be prealigned, which should increase speed of
computation of CC-Matrix calculations. Particles in the X-matrix will be masked
by the file given by mask_fn
. If the string ‘none’ is used in place of
mask_fn
, a default spherical mask is applied. This mask should be a binary
mask and only voxels within the mask are placed into the X-matrix which can
greatly speed up computations.
Example
subtom_parallel_xmatrix_msa(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'xmatrix_fn_prefix', 'class/xmatrix', ...
'ptcl_fn_prefix', 'subtomograms/subtomo_ali', ...
'mask_fn', 'combinedmotl/classification_mask.em', ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'iteration', 1, ...
'prealigned', 1, ...
'num_xmatrix_batch', 100, ...
'process_idx', 1)
See Also
Indices and tables
subTOM: Multireference
Multireference Classification
In multireference classification the full set of particles are compared to a predefined number of reference volumes. Each experimental particle is aligned with respect to all references and is assigned to one of them based on the constrained cross-correlation coefficient as a similarity measure. Averaging over the subsets determined serves to calculate new, improved references for further iterations. The user can then iterate this procedure and stop when no more migration of particles between subsets is observed and the averages within each subset do not change any more.
Within subTOM a function subtom_rand_class_motl
serves to initialize random
classes and initial references if the user does not alread have them, and then
the subTOM functions subtom_scan_angles_exact_multiref
, and
subtom_compare_motls_multiref
have been modified to fit the new modus of
multireference classification and alignment.
subtom_multiref_alignment
The main multireference pipeline process script of subTOM. Iteratively aligns and averages a collection of subvolumes, against multiple references.
This subtomogram alignment script uses five MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- align_exec
Alignment executable
- cat_exec
Concatenate MOTLs executable
- sum_exec
Parallel Summing executable
- avg_exec
Weighted Averaging executable
- compare_exec
Compare MOTLs executable
- motl_dump_exec
MOTL dump executable
Memory Options
- mem_free_ali
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max_ali
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
- mem_free_avg
The amount of memory the job requires for averaging.
- mem_max_avg
The upper bound on the amount of memory the averaging job is allowed to use.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Subtomogram Alignment Workflow Options
Parallelization Options
- start_iteration
The index of the reference to start from : input will be ref_fn_prefix_start_iteration.em and all_motl_fn_prefix_start_iteration.em (define as integer e.g. start_iteration=3)
More on iterations since they’re confusing and it is slightly different here than from previous iterations.
The start_iteration is the beginning for the iteration variable used throughout this script. Iteration refers to iteration that is used for subtomogram alignment. So if start_iteration is 1, then subtomogram alignment will work using allmotl_1.em and ref_1.em. The output from alignment will be particle motls for the next iteration. This in the script is avg_iteration variable. The particle motls will be joined to form allmotl_2.em and then the parallel averaging will form ref_2.em and then the loop is done and iteration will become 2 and avg_iteration will become 3.
- iterations
Number iterations (big loop) to run: final output will be ref_fn_prefix_start_iteration+iterations.em and all_motl_fn_prefix_start_iteration+iterations.em
- num_ali_batch
The number of batches to split the parallel subtomogram alignment job into.
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
File Options
- all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. all_motl_fn_prefix=’sub-directory/allmotl’)
- ref_fn_prefix
Relative path and name of the reference volumes (e.g. ref_iter.em , the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’)
- align_mask_fn
Relative path and name of the alignment mask Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- cc_mask_fn
Relative path and name of the cross-correlation mask this defines the maximum shifts in each direction Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- weight_fn_prefix
Relative path and name of the weight file.
- weight_sum_fn_prefix
Relative path and name of the partial weight files.
Alignment and Averaging Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- apply_weight
Apply weight to subtomograms (1=yes, 0=no).
- apply_mask
Apply mask to subtomograms (1=yes, 0=no).
- keep_class
If you want particles to be constrained to their existing class set this to 1, otherwise particles will change to the class of which they align best against.
- psi_angle_step
Angular increment in degrees, applied during the cone-search, i.e. psi and theta (define as real e.g. psi_angle_step=3). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- psi_angle_shells
Number of angular iterations, applied to psi and theta (define as integer e.g. psi_angle_shells=4). Note that in terms of cones this is twice the number of cones sampled. Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- phi_angle_step
Angular increment for phi in degrees, (define as real e.g. phi_angle_step=3). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- phi_angle_shells
Number of angular iterations for phi, (define as integer e.g. phi_angle_shells=6). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above. Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- low_pass_fp
Low pass filter (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above. Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- nfold
Symmetry, if no symmetry nfold=1 (define as integer e.g. nfold=3). Leave the parentheses and if the number of values is less than the number of iterations the last value will be repeated to the correct length.
- threshold
Threshold for cross correlation coefficient. Only particles with ccc_new > threshold will be added to new average (define as real e.g. threshold=0.5). These particles will still be aligned at each iteration.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
align_exec="${exec_dir}/classification/multiref/subtom_scan_angles_exact_multiref"
cat_exec="${exec_dir}/MOTL/subtom_cat_motls"
sum_exec="${exec_dir}/classification/general/subtom_parallel_sums_cls"
avg_exec="${exec_dir}/classification/general/subtom_weighted_average_cls"
compare_exec="${exec_dir}/classification/multiref/subtom_compare_motls_multiref"
motl_dump_exec="${exec_dir}/MOTL/motl_dump"
mem_free_ali=1G
mem_max_ali=64G
mem_free_avg=1G
mem_max_avg=64G
job_name=subTOM
array_max=1000
max_jobs=4000
run_local=0
skip_local_copy=1
start_iteration=1
iterations=3
num_ali_batch=1
num_avg_batch=1
all_motl_fn_prefix="combinedmotl/allmotl"
ref_fn_prefix="ref/ref"
ptcl_fn_prefix="subtomograms/subtomo"
align_mask_fn=("otherinputs/align_mask_1.em" \
"otherinputs/align_mask_2.em" \
"otherinputs/align_mask_3.em")
cc_mask_fn=("otherinputs/cc_mask_r10.em" \
"otherinputs/cc_mask_r05.em")
weight_fn_prefix="otherinputs/ampspec"
weight_sum_fn_prefix="otherinputs/wei"
tomo_row=7
apply_weight=0
apply_mask=1
keep_class=0
psi_angle_step=(10 5 2.5)
psi_angle_shells=(4)
phi_angle_step=(20 5)
phi_angle_shells=(6)
high_pass_fp=(1)
high_pass_sigma=(2)
low_pass_fp=(12 15 18)
low_pass_sigma=(3)
nfold=(1 6)
threshold=-1
subtom_multiref_initialize
A startup processing script for multireference classification in subTOM. Splits a given motive list into random equal classes and then generates the average of each class.
This subtomogram averaging script uses three MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- sum_exec
Parallel Summing executable
- avg_exec
Weighted Averaging executable
- rand_exec
Randomize Motive List executable
Memory Options
- mem_free
The amount of memory the job requires. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Subtomogram Averaging Workflow Options
Parallelization Options
- iteration
The index of the reference to generate : input will be all_motl_fn_prefix_iteration.em (define as integer e.g. iteration=1)
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
File Options
- all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. all_motl_fn_prefix=’sub-directory/allmotl’)
- ref_fn_prefix
Relative path and name of the reference volumes (e.g. ref_iter.em , the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’)
- weight_fn_prefix
Relative path and name of the weight file.
- weight_sum_fn_prefix
Relative path and name of the partial weight files.
Averaging Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- num_classes
The number of classes to split the initial motive list into. The classes will be assigned randomly evenly within the valid particles, (non-negative class values excluding class 2), with the class number starting at 3 to not interfere with the classes 1 and 2 which are reserved for AV3’s thresholding process.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
sum_exec="${exec_dir}/classification/general/subtom_parallel_sums_cls"
avg_exec="${exec_dir}/classification/general/subtom_weighted_average_cls"
rand_exec="${exec_dir}/classification/multiref/subtom_rand_class_motl"
mem_free=1G
mem_max=64G
job_name=subTOM
array_max=1000
max_jobs=4000
run_local=0
skip_local_copy=1
iteration=1
num_avg_batch=1
all_motl_fn_prefix="combinedmotl/allmotl"
ref_fn_prefix="ref/ref"
ptcl_fn_prefix="subtomograms/subtomo"
weight_fn_prefix="otherinputs/ampspec"
weight_sum_fn_prefix="otherinputs/wei"
tomo_row=7
num_classes=2
subtom_multiref_rand_class_motl
Randomizes a given number of classes in a motive list.
This MOTL manipulation script uses one MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- rand_exec
Randomize Motive List executable.
File Options
- input_motl_fn
Relative path and name of the input motivelist to be randomized in class.
- output_motl_fn
Relative path and name of the output motivelist.
Randomize MOTL Options
- num_classes
The number of classes to split the initial motive list into. The classes will be assigned randomly evenly within the valid particles, (non-negative class values excluding class 2), with the class number starting at 3 to not interfere with the classes 1 and 2 which are reserved for AV3’s thresholding process.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
rand_exec="${exec_dir}/classification/multiref/subtom_rand_class_motl
input_motl_fn="combinedmotl/allmotl_1.em"
output_motl_fn="combinedmotl/allmotl_multiref_1.em"
num_classes=2
subtom_compare_motls_multiref
Compares orientations, shifts and class changes between two MOTLs.
subtom_compare_motls_multiref(
'motl_1_fn', motl_1_fn (''),
'motl_2_fn', motl_2_fn (''),
'write_diffs', write_diffs (0),
'output_diffs_fn', output_diffs_fn (''))
Takes the motls given by motl_1_fn
and motl_2_fn
and calculates the
differences for both the orientations and coordinates between corresponding
particles in each motive list. For multireference alignment the number of
particles that have changed class is also determined. If write_diffs
evaluates to true as a boolean, then also a CSV file with the differences in
coordinates and orientations to diffs_output_fn
.
Example
subtom_compare_motls_multiref(...
'motl_1_fn', 'combinedmotl/allmotl_1.em', ...
'motl_2_fn', 'combinedmotl/allmotl_2.em', ...
'write_diffs', 1, ...
'output_diffs_fn', 'combinedmotl/allmotl_1_2_diff.csv')
See Also
subtom_rand_class_motl
Randomizes a given number of classes in a motive list.
subtom_rand_class_motl(
'input_motl_fn', input_motl_fn (''),
'output_motl_fn', output_motl_fn (''),
'num_classes', num_classes ('2'))
Takes the motive list given by input_motl_fn
, and splits it into
num_classes
even classes using the 20th row of the motive list, and then
writes the transformed motive list out as output_motl_fn
. The values that go
into the 20th row start at 3 and particles that initially have negative or the
value 2 in the 20th row are ignored as described in AV3 documentation for the
behavior of class numbers.
Example
subtom_rand_class_motl(...
'input_motl_fn', 'combinedmotl/allmotl_1.em', ...
'output_motl_fn', 'combinedmotl/allmotl_multiref_1.em', ...
'num_classes', '2')
See Also
subtom_scan_angles_exact_multiref
Align a particle batch over local search angles.
subtom_scan_angles_exact_multiref(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'align_mask_fn', align_mask_fn ('none'),
'cc_mask_fn', cc_mask_fn ('noshift'),
'apply_weight', apply_weight (0),
'apply_mask', apply_mask (0),
'keep_class', keep_class (0),
'psi_angle_step', psi_angle_step (0),
'psi_angle_shells', psi_angle_shells (0),
'phi_angle_step', phi_angle_step (0),
'phi_angle_shells', phi_angle_shells (0),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'threshold', threshold (-1),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'num_ali_batch', num_ali_batch (1),
'process_idx', process_idx (1))
Aligns a batch of particles from the collective motive list with the name format
all_motl_fn_prefix
_#.em where # is the number iteration
. The motive
list is split into num_ali_batch
chunks and the specific chunk to process is
specified by process_idx
. A motive list for the best determined alignment
parameters is written out for each batch with the name format
ptct_motl_fn_prefix
_#_#.em where the first # is iteration
+ 1 and the
second # is the number process_idx
.
Particles, with the name format ptcl_fn_prefx
_#.em where # is the
subtomogram ID, are aligned against the reference with the name format
ref_fn_prefix
_#.em where # is iteration
. Before the comparison is made
a number of alterations are made to both the particle and reference:
If
nfold
is greater than 1 then C#-symmetry is applied along the Z-axis to the reference where # isnfold
.The reference is masked in real space with the mask
align_mask_fn
, and ifapply_mask
evaluates to true as a boolean, then this mask is also applied to the particle. A sphere mask is applied to the particle to reduces the artifacts caused by the box-edges on the comparison. This sphere has a diameter that is 80% the box size and falls of with a sigma that is 15% half the box size.
The mask is rotated and shifted with the currently existing alignment parameters for the particle as to best center the mask on the particle density.
apply_mask
can help alignment and suppress alignment to other features when the particle is well-centered or already reasonably well aligned, but if this is not the case there is the risk that a tight alignment will cutoff parts of the particle.Both the particle and the reference are bandpass filtered in the Fourier domain defined by
high_pass_fp
,high_pass_sigma
,low_pass_fp
, andlow_pass_sigma
which are all in the units of Fourier pixels.A Fourier weight volume with the name format
weight_fn_prefix
_#.em where # corresponds to the tomogram from which the particle came from, which is found from the fieldtomo_row
in the motive list, is applied to the reference in the Fourier domain, after the reference has been rotated with the currently existing alignment parameters. Ifapply_weight
evaluates to true as a boolean, then this weight is also applied to the particle with no rotation. This Fourier weight is designed to compensate for the missing wedge.
If a binary wedge is used, then it is reasonable to apply the weight to the particle, however, for more complicated weights, like the average amplitude spectrum, it should not be done.
The local rotations searched during alignment are deteremined by the four
parameters psi_angle_step
, psi_angle_shells
, phi_angle_step
, and
phi_angle_shells
. They describe a search where the currently existing
alignment parameters for azimuth and zenith are used to define a “pole” to
search about in the ceiling of half psi_angle_shells
cones. The change in
zenith between each cone is psi_angle_step
and the azimuth around the cone
is close to the same angle but is adjusted slightly to account for bias near the
pole. The final spin angle of the search is done with a change in spin of
phi_angle_step
in phi_angle_shells
steps. The spin is applied in both
clockwise and counter-clockwise fashion.
The angles phi, and psi here are flipped in their sense of every other package for EM image processing, which is absolutely infuriating and confusing, but maintained for historical reasons, however most descriptions use the words azimuth, zenith, and spin to avoid ambiguity.
Finally after the constrained cross-correlation function is calculated it is
masked with cc_mask_fn
to limit the shifts to inside this volume, and a peak
is found and it’s location is determined to sub-pixel accuracy using
interpolation. The rotations and shifts that gives the highest cross-correlation
coefficient are then chosen as the new alignments parameters. Particles with a
coefficient lower than threshold
are placed into class 2 and ignored in
later processing, and particles with class iclass
are the only particles
processed.
If
iclass
is 0 all particles will be considered, and particles abovethreshold
will be assigned to iclass of 1 and particles belowthreshold
will be assigned to iclass of 2. Ificlass
is 1 or 2 then particles with iclass 0 will be skipped, particles of iclass 1 and 2 will be aligned and particles with scores abovethreshold
will be assigned to iclass 1 and particles with scores belowthreshold
will be assigned to iclass 2.iclass
of 2 does not make much sense but is set this way in case of user mistakes or misunderstandings. Ificlass
is greater than 2 then particles with iclass of 1, 2, andiclass
will be aligned, and particles with a score abovethreshold
will maintain their iclass if it is 1 oriclass
, and particles with a previous iclass of 2 will be upgraded to an iclass of 1. Particles with a score belowthreshold
will be assigned to iclass 2.The class number is stored in the 20th field of the motive list.
Example
subtom_scan_angles_multiref(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', 'ref/ref', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'align_mask_fn', 'otherinputs/align_mask.em', ...
'cc_mask_fn', 'otherinputs/cc_mask.em', ...
'apply_weight', 0, ...
'apply_mask', 1, ...
'keep_class', 0, ...
'psi_angle_step', 6, ...
'psi_angle_shells', 8, ...
'phi_angle_step', 6, ...
'phi_angle_shells', 8, ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 12, ...
'low_pass_sigma', 3, ...
'nfold', 6, ...
'threshold', 0, ...
'iteration', 1, ...
'tomo_row', 7, ...
'num_ali_batch', 1, ...
'process_idx', 1)
See Also
Indices and tables
subTOM: Principal Component Analysis
Principal Component Analysis Classification
In principal component analysis (PCA) classification the full set of particles are simplified into a new lower-dimensional representation by means of Eigen or Singular Value decomposition methods. Particles projected onto these most variable basis-vectors then can be clustered using a variety of methods.
Within subTOM particles are first compared using Constrained Cross-Correlation
taking into account the missing wedge. The pairs used in comparison are
pre-calculated with the function subtom_prepare_ccmatrix
. To speed up
calculation particles can be pre-aligned using the function
subtom_parallel_prealign
.
The comparisons are calculated in parallel batches with
subtom_parallel_ccmatrix
and the results are combined with
subtom_join_ccmatrix
. The Cross-Correlation matrix is then decomposed into a
user-given number of basis vectors using either Eigenvalue decomposition with
subtom_eigs
or Singular Value decomposition with subtom_svds
, which the
basis vectors and their respective weights.
The particles that were compared against are then projected onto these vectors
by first constructing a matrix of the aligned data with
subtom_parallel_xmatrix_pca
and then projected in parallel batches with
subtom_parallel_eigenvolumes
and joined with subtom_join_eigenvolumes
.
These volumes are then used to determine the low-rank approximation coefficients
in volume space for clustering. A larger particle superset can be projected onto
the volumes to speed up classification of large datasets. Coefficients are also
calculated in parallel in batches with subtom_parallel_eigcoeffs_pca
and
joined with subtom_join_eigencoeffs_pca
.
Finally using a user-selected subset of the determined coefficients, the data is
clustered either by Hierarchical Ascendant Clustering using a Ward distance
criterion, K-Means clustering, or a Gaussian Mixture model with the function
subtom_cluster
. This clustering is then used to generate the final class
averages.
subtom_pca
The main PCA pipeline process script of subTOM.
This subtomogram classification script uses fourteen MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- cluster_exec
Cluster executable.
- eig_exec
Eigendecomposition executable.
- pre_ccmatrix_exec
Prepare CC-Matrix executable.
- par_ccmatrix_exec
Parallel CC-Matrix executable.
- ccmatrix_exec
Final CC-Matrix executable.
- par_eigcoeff_exec
Parallel Eigencoefficient executable.
- eigcoeff_exec
Final Eigencoefficient executable.
- par_eigvol_exec
Parallel Eigenvolume executable.
- eigvol_exec
Final Eigenvolume executable.
- preali_exec
Parallel Subtomogram prealign executable.
- xmatrix_exec
Parallel X-Matrix executable.
- svds_exec
Singular Value Decomposition executable.
- sum_exec
Parallel Summing executable
- avg_exec
Final Averaging executable
- motl_dump_exec
MOTL dump executable
Memory Options
- mem_free
The amount of memory the job requires. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Parallelization Options
- iteration
The index of the references to generate : input will be all_motl_fn_prefix_iteration.em (define as integer e.g. iteration=1)
- num_ccmatrix_prealign_batch
Number of batches to split the parallel particle prealignment for the CC-Matrix calculation into. If you are not doing prealignment you can ignore this option.
- num_ccmatrix_batch
Number of batches to split the parallel CC-Matrix calculation job into.
- num_xmatrix_batch
Number of batches to split the parallel X-Matrix calculation job into. This also determines the number of batches the Eigenvolumes calculation will be split into.
- num_eig_coeff_prealign_batch
Number of batches to split the parallel particle prealignment for the Eigencoefficients calculations into. If you are not doing prealignment you can ignore this option.
- num_eig_coeff_batch
Number of batches to split the parallel Eigencoefficient calculation into.
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
Subtomogram Classification Workflow Options
CC-Matrix Options
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer).
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above.
- low_pass_fp
Low pass filter (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer).
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above.
- nfold
Symmetry to apply to each pair of particle and reference in CC-Matrix calculation, if no symmetry ccmatrix_nfold=1 (define as integer e.g. ccmatrix_nfold=3)
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- ccmatrix_prealign
If you want to pre-align all of the particles to speed up the CC-Matrix calculation, set the following to 1, otherwise the particles will be aligned during the computation.
CC-Matrix File Options
- ccmatrix_all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. ccmatrix_all_motl_fn_prefix=’sub-directory/allmotl’).
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’).
- mask_fn
Relative path and name of the classification mask. This should be a binary mask as correlations are done in real-space, and calculations will only be done using voxels passed by the mask, so smaller masks will run faster. If you want to use the default spherical mask set mask_fn to ‘none’.
- weight_fn_prefix
Relative path and name of the weight file.
- ccmatrix_fn_prefix
Relative path and name of the CC-Matrix.
Eigendecomposition Options
- decomp_type
The following determines which type of decomposition to perform. If the following is ‘eigs’, then traditional Eigenvalue decomposition will be calculated and either the largest magnitude or largest algebraic Eigenvalues will be returned, however in the CC-Matrix calculation the Eigenvalues can be negative which can be problematic in later stages of processing, and so ‘svds’ can also be given and Singular Value Decomposition will calculated instead.
- num_eigs
The number of Eigenvectors and Eigenvalues (or Left Singular Vectors and Singular Values) to calculate.
- eigs_iterations
If using ‘eigs’ the following allows you to adjust the number of iterations to use in the decomposition. If you want to use the default number of iterations leave this set to ‘default’.
- eigs_tolerance
If using ‘eigs’ the following allows you to adjust the convergence tolerance of the decomposition calculation. If you want to use the default tolerance leave this set to ‘default’.
- do_algebraic
If using ‘eigs’ the following allows you to calculate the largest algebraic Eigenvalues, which are guaranteed to be positive but not guaranteed to be the largest in magnitude. This is in contrast to the default behavior of calculating the largest magnitude Eigenvalues that are not guaranteed to be non-negative.
- svds_iterations
If using ‘svds’ the following allows you to adjust the number of iterations to use in the decomposition. If you want to use the default number of iterations leave this set to ‘default’.
- svds_tolerance
If using ‘svds’ the following allows you to adjust the convergence tolerance of the decomposition calculation. If you want to use the default tolerance leave this set to ‘default’.
Eigendecomposition File Options
- eig_vec_fn_prefix
Relative path and name of the Eigenvectors (or Left Singular Vectors).
- eig_val_fn_prefix
Relative path and name of the Eigenvalues (or Singular Values).
X-Matrix File Options
- xmatrix_fn_prefix
Relative path and name of the X-Matrix.
Eigenvolumes File Options
- eig_vol_fn_prefix
Relative path and name of the Eigenvolumes.
Eigencoefficient Options
- apply_weight
If the following is set to 1, the Eigenvolume (or conjugate-space Eigenvector) will have the particles missing-wedge weight applied to it before the Correlation is calculated.
- eig_coeff_prealign
If you want to pre-align all of the particles to speed up the Eigencoefficient calculation, set the following to 1, otherwise the particles will be aligned during the computation.
Eigencoefficient File Options
- eig_coeff_all_motl_fn_prefix
Relative path and name of the concatenated motivelist to project onto the Eigenvolumes (conjugate-space Eigenvectors). This can be a larger motivelist than the one used to calculate the CC-Matrix and Eigenvolumes.
- eig_coeff_fn_prefix
Relative path and name of the Eigencoefficients.
Clustering Options
- cluster_type
The following determines which algorithm will be used to cluster the determined Eigencoefficients. The valid options are K-means clustering, ‘kmeans’, Hierarchical Ascendent Clustering using a Ward Criterion, ‘hac’, and a Gaussian Mixture Model, ‘gaussmix’.
- eig_idxs
Determines which Eigencoefficients are used to cluster. The format should be a semicolon-separated list that also supports ranges with a dash (-), for example 1-5;7;15-19 would select the first five Eigencoefficients, the seventh and the fifteenth through the nineteenth for classification. If it is left as “all” all coefficients will be used.
- num_classes
How many classes should the particles be clustered into.
Clustering File Options
- cluster_all_motl_fn_prefix
Relative path and name of the concatenated motivelist of the output classified particles.
Averaging File Options
- ref_fn_prefix
Relative path and name prefix of the reference volumes (e.g. ref_iter.em, the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- weight_sum_fn_prefix
Relative path and name prefix of the partial weight files.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="XXXINSTALLATION_DIRXXX/bin"
cluster_exec="${exec_dir}/classification/general/subtom_cluster"
eigs_exec="${exec_dir}/classification/pca/subtom_eigs"
pre_ccmatrix_exec="${exec_dir}/classification/pca/subtom_prepare_ccmatrix"
par_ccmatrix_exec="${exec_dir}/classification/pca/subtom_parallel_ccmatrix"
ccmatrix_exec="${exec_dir}/classification/pca/subtom_join_ccmatrix"
par_eigcoeff_exec="${exec_dir}/classification/pca/subtom_parallel_eigencoeffs_pca"
eigcoeff_exec="${exec_dir}/classification/pca/subtom_join_eigencoeffs_pca"
par_eigvol_exec="${exec_dir}/classification/pca/subtom_parallel_eigenvolumes"
eigvol_exec="${exec_dir}/classification/pca/subtom_join_eigenvolumes"
preali_exec="${exec_dir}/classification/general/subtom_parallel_prealign"
xmatrix_exec="${exec_dir}/classification/pca/subtom_parallel_xmatrix_pca"
svds_exec="${exec_dir}/classification/pca/subtom_svds"
sum_exec="${exec_dir}/classification/general/subtom_parallel_sums_cls"
avg_exec="${exec_dir}/classification/general/subtom_weighted_average_cls"
motl_dump_exec="${exec_dir}/MOTL/motl_dump"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
array_max="1000"
max_jobs="4000"
run_local="0"
skip_local_copy="1"
iteration="1"
num_ccmatrix_prealign_batch="1"
num_ccmatrix_batch="1"
num_xmatrix_batch="1"
num_eig_coeff_prealign_batch="1"
num_eig_coeff_batch="1"
num_avg_batch="1"
high_pass_fp="0"
high_pass_sigma="2"
low_pass_fp="0"
low_pass_sigma="3"
nfold="1"
tomo_row="7"
ccmatrix_prealign=0
ccmatrix_all_motl_fn_prefix="combinedmotl/allmotl"
ptcl_fn_prefix="subtomograms/subtomo"
mask_fn="none"
weight_fn_prefix="otherinputs/ampspec"
ccmatrix_fn_prefix="class/ccmatrix_pca"
decomp_type='svds'
num_eigs='40'
eigs_iterations='default'
eigs_tolerance='default'
do_algebraic=0
svds_iterations='default'
svds_tolerance='default'
eig_vec_fn_prefix="class/eigvec_pca"
eig_val_fn_prefix="class/eigval_pca"
xmatrix_fn_prefix="class/xmatrix_pca"
eig_vol_fn_prefix="class/eigvol_pca"
apply_weight="0"
eig_coeff_prealign="0"
eig_coeff_all_motl_fn_prefix="combinedmotl/allmotl"
eig_coeff_fn_prefix="class/eigcoeff_pca"
cluster_type="kmeans"
eig_idxs="all"
num_classes=2
cluster_all_motl_fn_prefix="class/allmotl_pca"
ref_fn_prefix="class/ref_pca"
weight_sum_fn_prefix="class/wei_pca"
subtom_eigs
Uses MATLAB eigs to calculate a subset of Eigenvalue/vectors.
subtom_eigs(
'ccmatrix_fn_prefix', ccmatrix_fn_prefix ('class/ccmatrix_pca'),
'eig_vec_fn_prefix', eig_vec_fn_prefix ('class/eigvec_pca'),
'eig_val_fn_prefix', eig_val_fn_prefix ('class/eigval_pca'),
'iteration', iteration (1),
'num_eigs', num_eigs (40),
'eigs_iterations', eigs_iterations ('default'),
'eigs_tolerance', eig_tolerance ('default'),
'do_algebraic', do_algebraic (0))
Uses the MATLAB function eigs to calculate a subset of eigenvalues and
eigenvectors given the constrained cross-correlation (covariance) matrix with
the filename given by ccmatrix_fn_prefix
and iteration
. num_eigs
of
the largest eigenvalues and eigenvectors will be calculated, and will be written
out as specified by eig_val_fn_prefix
, eig_vec_fn_prefix
and
iteration
respectively. Two options eigs_iterations
and
eigs_tolerance
are also available to tune how eigs is run. If the string
‘default’ is given for either the default values in eigs will be used. If
do_algebraic
evaluates to true as a boolean ‘la’ will be used in place of
‘lm’ in the call to eigs, this could be a valid option in the case when ‘lm’
returns negative eigenvalues.
Example
subtom_eigs(...
'ccmatrix_fn_prefix', 'class/ccmatrix', ...
'eig_vec_fn_prefix', 'class/eigvec', ...
'eig_val_fn_prefix', 'class/eigval', ...
'iteration', 1, ...
'num_eigs', 50, ...
'eigs_iterations', 'default', ...
'eigs_tolerance', 'default', ...
'do_algebraic', 1)
See Also
subtom_join_ccmatrix
Combines chunks of the Cross-Correlation matrix into the final full matrix.
subtom_join_ccmatrix(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ccmatrix_fn_prefix', ccmatrix_fn_prefix ('class/ccmatrix_pca'),
'iteration', iteration (1),
'num_ccmatrix_batch', num_ccmatrix_batch (1))
Looks for partial chunks of the ccmatrix with the file name given by
ccmatrix_fn_prefix
, iteration
and # where # is from 1 to
num_ccmatrix_batch
, and then combines these chunks into the final ccmatrix
and writes it out to the file specified by ccmatrix_fn_prefix
and
iteration
Example
subtom_join_ccmatrix(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ccmatrix_fn_prefix', 'class/ccmatrix', ...
'iteration', 1, ...
'num_ccmatrix_batch', 1)
See Also
subtom_join_eigencoeffs_pca
Randomizes a given number of classes in a motive list.
subtom_join_eigencoeffs_pca(
'eig_coeff_fn_prefix', eig_coeff_fn_prefix ('class/eigcoeff_pca'),
'iteration', iteration (1),
'num_coeff_batch', num_coeff_batch (1))
Looks for partial chunks of the low-rank approximation coefficients of projected
particles with the file name given by eig_coeff_fn_prefix
, iteration
and
# where # is from 1 to num_coeff_batch
, and combines them into a final
matrix of coefficients written out as specified by eig_coeff_fn_prefix
and
iteration
.
Example
subtom_join_eigencoeffs_pca(...
'eig_coeff_fn_prefix', 'class/eigcoeff', ...
'iteration', 1, ...
'num_coeff_batch', 100)
See Also
subtom_join_eigenvolumes
Computes the final sum of projections of the data onto Eigenvectors.
subtom_join_eigenvolumes(
'eig_vol_fn_prefix', eig_vol_fn_prefix ('class/eigvol_pca'),
'iteration', iteration (1),
'num_eigs', num_eigs (40),
'num_xmatrix_batch', num_xmatrix_batch (1))
Calculates the sum of previously calculated Eigenvolume partial sums, (projections onto previously determined Eigen (or left singular) vectors), which can then be used to determine which vectors can best influence classification. The Eigenvolumes are expected to have been split into NUM_XMATRIX_BATCH sums. The output averages will be written out as given by EIG_VOL_FN_PREFIX, ITERATION and #, where the # is the particular Eigenvolume being written out from 1 to NUM_EIGS. For easier viewing a montage of the Eigenvolumes is made along the X, Y, and Z axes, written out as specified by EIG_VOL_FN_PREFIX, (X,Y,Z) and ITERATION.
Example
subtom_join_eigenvolumes(...
'eig_vol_fn_prefix', 'class/eigvol', ...
'iteration', 1, ...
'num_eigs', 40, ...
'num_xmatrix_batch', 100)
See Also
subtom_parallel_ccmatrix
Calculates pairwise Constrained Cross-Correlation scores of aligned particles.
subtom_parallel_ccmatrix(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ccmatrix_fn_prefix', ccmatrix_fn_prefix ('class/ccmatrix_pca'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'mask_fn', mask_fn ('none'),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'prealigned', prealigned (0),
'num_ccmatrix_batch', num_ccmatrix_batch (1),
'process_idx', process_idx (1))
Aligns a subset of particles using the rotations and shifts in the file given by
all_motl_fn_prefix
and iteration
. If prealigned
evaluates to true as
boolean, then the particles in ptcl_fn_prefix
are assumed to be prealigned,
which should increase the speed of the processing. The subset of particles
compared is specified by the file given by ccmatrix_fn_prefix
,
iteration
, and process_idx
appended with ‘_pairs’, and the output list
of cross-correlation coefficients will be written out to the file specified by
ccmatrix_fn_prefix
, iteration
, and process_idx
. Fourier weight
volumes with name prefix weight_fn_prefix
will also be aligned so that the
cross-correlation cofficient can be constrained to only overlapping shared
regions of Fourier space. tomo_row
describes which row of the MOTL file is
used to determine the correct tomogram Fourier weight file. The correlation is
also constrained by a bandpass filter specified by high_pass_fp
,
high_pass_sigma
, low_pass_fp
and low_pass_sigma
.
Example
subtom_parallel_ccmatrix(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ccmatrix_fn_prefix', 'class/ccmatrix', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'ptcl_fn_prefix', 'subtomograms/alisubtomo', ...
'mask_fn', 'otherinputs/classification_mask.em', ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'iteration', 1, ...
'tomo_row', 7, ...
'prealigned', 1, ...
'num_ccmatrix_batch', 100, ...
'process_idx', 1)
See Also
subtom_parallel_eigencoeffs_pca
Computes particle Eigencoefficients
subtom_parallel_eigencoeffs_pca(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'eig_coeff_fn_prefix', eig_coeff_fn_prefix ('class/eigcoeff_pca'),
'eig_val_fn_prefix', eig_val_fn_prefix ('class/eigval_pca'),
'eig_vol_fn_prefix', eig_vol_fn_prefix ('class/eigvol_pca'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'mask_fn', mask_fn ('none'),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'apply_weight', apply_weight (0),
'tomo_row', tomo_row (7),
'iteration', iteration (1),
'prealigned', prealigned (0),
'num_coeff_batch', num_coeff_batch (1),
'process_idx', process_idx (1))
Takes a batch subset of particles described by all_motl_fn_prefix
with
filenames given by ptcl_fn_prefix
, band-pass filters them as described by
high_pass_fp
, high_pass_sigma
, low_pass_fp
, and low_pass_sigma
,
optionally applies C-symmetry specified by nfold
, and projects them onto by
default the Eigenvolumes specified by eig_vol_fn_prefix
. This determines a
set of coefficients describing a low-rank approximation of the data. A subset of
this coefficient matrix is written out based on eig_coeff_fn_prefix
and
process_idx
, with there being num_coeff_batch
batches in total.
If apply_weight
is set to 1 the Eigenvolumes will be reweighted using the
correct weight of each particle as described by weight_fn_prefix
and
tomo_row
, then each particle will be read and projected in a loop. If
prealigned
is set to 1, then it is understood that the particles have been
prealigned beforehand and the alignment of the particles can be skipped to save
time. mask_fn
describes the mask used throughout classification and ‘none’
describes a default spherical mask.
Example
subtom_parallel_eigencoeffs_pca(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ptcl_fn_prefix', 'subtomograms/subtomo_ali', ...
'eig_coeff_fn_prefix', 'class/eigcoeff', ...
'eig_val_fn_prefix', 'class/eigval', ...
'eig_vol_fn_prefix', 'class/eigvol', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'mask_fn', 'otherinputs/classification_mask.em', ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'apply_weight', 1, ...
'tomo_row', 7, ...
'iteration', 1, ...
'prealigned', 1, ...
'num_coeff_batch', 100, ...
'process_idx', 1)
See Also
subtom_parallel_eigenvolumes
Computes projections of data onto Eigenvectors.
subtom_parallel_eigenvolumes(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'eig_vec_fn_prefix', eig_vec_fn_prefix ('class/eigvec_pca'),
'eig_val_fn_prefix', eig_val_fn_prefix ('class/eigval_pca'),
'xmatrix_fn_prefix', xmatrix_fn_prefix ('class/xmatrix_pca'),
'eig_vol_fn_prefix', eig_vol_fn_prefix ('class/eigvol_pca'),
'mask_fn', mask_fn ('none'),
'iteration', iteration (1),
'num_xmatrix_batch', num_xmatrix_batch (1),
'process_idx', process_idx (1))
Calculates the summed projections of particles onto previously determined Eigen
(or left singular) vectors, by means of an also previously calculated X-matrix
to produce Eigenvolumes which can then be used to determine which vectors can
best influence classification. The Eigenvectors are named based on
eig_vec_fn_prefix
and iteration
and the X-Matrix is named based on
xmatrix_fn_prefix
, iteration
, and process_idx
. The Eigenvolumes are
also masked by the file specified by mask_fn
. The Eigenvolumes are split
into num_xmatrix_batch
sums, which is the same number of batches that the
X-Matrix was broken into in its computation. process_idx
is a counter that
designates the current batch being determined. The output sum Eigenvolume will
be written out as specified by eig_vol_fn_prefix
, iteration
,
process_idx
and # where the # is the particular Eigenvolume being written
out.
Example
subtom_parallel_eigenvolumes(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'eig_vec_fn_prefix', 'class/eigvec', ...
'eig_val_fn_prefix', 'class/eigval', ...
'xmatrix_fn_prefix', 'class/xmatrix', ...
'eig_vol_fn_prefix', 'class/eigvol', ...
'mask_fn', 'otherinputs/classification_mask.em', ...
'iteration', 1, ...
'num_xmatrix_batch', 100, ...
'process_idx', 1)
See Also
subtom_parallel_xmatrix_pca
Calculates chunks of the X-matrix for processing.
subtom_parallel_xmatrix_pca(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'xmatrix_fn_prefix', xmatrix_fn_prefix ('class/xmatrix_pca'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'mask_fn', mask_fn ('none'),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'iteration', iteration (1),
'prealigned', prealigned (0),
'num_xmatrix_batch', num_xmatrix_batch (1),
'process_idx', process_idx (1))
Aligns a subset of particles using the rotations and shifts given by
all_motl_fn_prefix
and iteration
, band-pass filters the particle as
described by high_pass_fp
, high_pass_sigma
, low_pass_fp
, and
low_pass_sigma
, optionally applies nfold
C-symmetry, and then places
these voxels as a 1-D row vector in a data sub-matrix which is historically
known as the X-matrix (See Borland, Van Heel 1990 J. Opt. Soc. Am. A). This
X-matrix can then be used to speed up the calculation of Eigenvolumes and
Eigencoefficients used for classification. The subset of particles compared is
specified by the number of particles in the motive list and the number of
requested batches specified by num_xmatrix_batch
, with the specific subset
deteremined by process_idx
. The X-matrix chunk will be written out as
specified by xmatrix_fn_prefix
, iteration
and process_idx
. The
location of the particles is specified by ptcl_fn_prefix
. If prealigned
evaluates to true as a boolean then the particles are assumed to be prealigned,
which should increase speed of computation of CC-Matrix calculations. Particles
in the X-matrix will be masked by the file given by mask_fn
. If the string
‘none’ is used in place of mask_fn
, a default spherical mask is applied.
This mask should be a binary mask and only voxels within the mask are placed
into the X-matrix which can greatly speed up computations.
Example
subtom_parallel_xmatrix_pca(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'xmatrix_fn_prefix', 'class/xmatrix', ...
'ptcl_fn_prefix', 'subtomograms/alisubtomo', ...
'mask_fn', 'combinedmotl/classification_mask.em', ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'iteration', 1, ...
'prealigned', 1, ...
'num_xmatrix_batch', 100, ...
'process_idx', 1)
See Also
subtom_prepare_ccmatrix
Calculates batch of pairwise comparisons of particles.
subtom_prepare_ccmatrix(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ccmatrix_fn_prefix', ccmatrix_fn_prefix ('class/ccmatrix_pca'),
'iteration', iteration (1),
'num_ccmatrix_batch', num_ccmatrix_batch (1))
Figures out the pairwise comparisons to make from the motivelist given by
all_motl_fn_prefix
and iteration
, and breaks up these comparisons into
num_ccmatrix_batch
batches for parallel computation. Each batch is written
out as an array with the ‘reference’ particle index and ‘target’ particle index
to an EM-file with the name described by ccmatrix_fn_prefix
, iteration
,
#, and ‘_pairs’ where the # is the batch index.
Example
subtom_prepare_ccmatrix(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ccmatrix_fn_prefix', 'class/ccmatrix', ...
'iteration', 1, ...
'num_ccmatrix_batch', 1000);
See Also
subtom_svds
Uses MATLAB svds to calculate a subset of singular values/vectors.
subtom_svds(
'ccmatrix_fn_prefix', ccmatrix_fn_prefix ('pca/ccmatrix'),
'eig_vec_fn_prefix', eig_vec_fn_prefix ('pca/eigvec'),
'eig_val_fn_prefix', eig_val_fn_prefix ('pca/eigval'),
'iteration', iteration (1),
'num_svs', num_svs (40),
'svds_iterations', svds_iterations ('default'),
'svds_tolerance', svds_tolerance ('default'))
Uses the MATLAB function svds to calculate a subset of singular values and
vectors given the constrained cross-correlation (covariance) matrix with the
filename given by ccmatrix_fn_prefix
and iteration
. num_svs
of the
largest singular values and vectors will be calculated, and will be written out
based on eig_val_fn_prefix
and iteration
; and eig_vec_fn_prefix
and
iteration
respectively. Two options svds_iterations
and
svds_tolerance
are also available to tune how svds is run. If the string
‘default’ is given for either the default values in eigs will be used.
Example
subtom_svds(...
'ccmatrix_fn_prefix', 'pca/ccmatrix', ...
'eig_vec_fn_prefix', 'pca/eigvec', ...
'eig_val_fn_prefix', 'pca/eigval', ...
'iteration', 1, ...
'num_svs', 50, ...
'svds_iterations', 'default', ...
'svds_tolerance', 'default')
See Also
Indices and tables
subTOM: Wedge-Masked Difference Classification
Wedge-Masked Difference Classification
In Wedge-Masked Difference (WMD) classification the full set of particles are simplified into a new lower-dimensional representation by means of Singular Value decomposition after attempting to take into account the effects of the missing-wedge. Particles projected onto these most variable basis-vectors then can be clustered using a variety of methods.
Within subTOM, wedge-masked differences (the result of the subtraction of the
overall reference, weighted with the particles missing-wedge, and the particle
itself) are first compiled into a 2-D Matrix denoted here as the D-Matrix, which
holds the aligned, band-pass filtered and masked difference data. To speed up
calculation particles can be pre-aligned using the function
subtom_parallel_prealign
. Batches of the D-Matrix are calculated in parallel
with subtom_parallel_dmatrix
and then combined and column-centered with
subtom_join_dmatrix
.
Next the D-Matrix is decomposed by Singular Value decomposition as to skip
calculation of the covariance matrix as described in J. Heumann et al.
in J. Struct. Biol. 2011. This determines a set of
right Singular vectors and Singular values and these are used along with
the D-Matrix to determine the Eigenvolumes of the dataset with
subtom_eigenvolumes_wmd
.
These volumes are then used to determine the low-rank approximation coefficients
in volume space for clustering. A larger particle superset can be projected onto
the volumes to speed up classification of large datasets. Coefficients are also
calculated in parallel in batches with subtom_parallel_coeffs
and
joined with subtom_join_coeffs
.
Finally using a user-selected subset of the determined coefficients, the data is
clustered either by Hierarchical Ascendant Clustering using a Ward distance
criterion, K-Means clustering, or a Gaussian Mixture model with the function
subtom_cluster
. This clustering is then used to generate the final class
averages.
subtom_wmd
The main WMD pipeline process script of subTOM.
This subtomogram classification script uses nine MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- local_dir
Absolute path to the folder on a group share, if the scratch directory is cleaned and deleted regularly this can set a local directory to which the important results will be copied. If this is not needed it can be skipped with the option skip_local_copy below.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- cluster_exec
Cluster executable.
- par_coeff_exec
Parallel Coefficient executable.
- coeff_exec
Final Coefficient executable.
- eigvol_exec
Eigenvolume Calculation executable.
- preali_exec
Parallel Subtomogram prealign executable.
- par_dmatrix_exec
Parallel D-Matrix executable.
- dmatrix_exec
Final D-Matrix executable.
- sum_exec
Parallel Summing executable
- avg_exec
Final Averaging executable
- motl_dump_exec
MOTL dump executable
Memory Options
- mem_free
The amount of memory the job requires. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
- skip_local_copy
Set this option to 1 to skip the copying of data to local_dir.
Parallelization Options
- iteration
The index of the references to generate : input will be all_motl_fn_prefix_iteration.em (define as integer e.g. iteration=1)
- num_dmatrix_prealign_batch
Number of batches to split the parallel particle prealignment for the D-Matrix calculation into. If you are not doing prealignment you can ignore this option.
- num_dmatrix_batch
Number of batches to split the parallel D-Matrix calculation job into.
- num_coeff_prealign_batch
Number of batches to split the parallel particle prealignment for the coefficients calculations into. If you are not doing prealignment you can ignore this option.
- num_coeff_batch
Number of batches to split the parallel coefficient calculation into.
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
Subtomogram Classification Workflow Options
D-Matrix Options
- high_pass_fp
High pass filter cutoff (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer).
- high_pass_sigma
High pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the high-pass filter past the cutoff above.
- low_pass_fp
Low pass filter (in transform units (pixels): calculate as (box_size * pixelsize) / (resolution_real) (define as integer).
- low_pass_sigma
Low pass filter falloff sigma (in transform units (pixels): describes a Gaussian sigma for the falloff of the low-pass filter past the cutoff above.
- nfold
Symmetry to apply to each pair of particle and reference in D-Matrix calculation, if no symmetry nfold=1 (define as integer e.g. nfold=3).
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- dmatrix_prealign
If you want to pre-align all of the particles to speed up the D-Matrix calculation, set the following to 1, otherwise the particles will be aligned during the computation.
D-Matrix File Options
- dmatrix_all_motl_fn_prefix
Relative path and name of the concatenated motivelist of all particles (e.g. allmotl_iter.em , the variable will be written as a string e.g. dmatrix_all_motl_fn_prefix=’sub-directory/allmotl’).
- dmatrix_fn_prefix
Relative path and name of the D-Matrix.
- ptcl_fn_prefix
Relative path and name of the subtomograms (e.g. part_n.em , the variable will be written as a string e.g. ptcl_fn_prefix=’sub-directory/part’).
- dmatrix_ref_fn_prefix
Relative path and name prefix of the reference volume used for calculating the wedge-masked differences (e.g. ref_iter.em, the variable will be written as a string e.g. dmatrix_ref_fn_prefix=’sub-directory/ref’)
- weight_fn_prefix
Relative path and name of the weight file, used for calculating the wedge-masked differences.
- mask_fn
Relative path and name of the classification mask. This should be a binary mask as correlations are done in real-space, and calculations will only be done using voxels passed by the mask, so smaller masks will run faster. If you want to use the default spherical mask set mask_fn to ‘none’.
Eigenvolume Options
- num_svs
The number of right Singular Vectors and Singular Values to calculate.
- svds_iterations
The following allows you to adjust the number of iterations to use in the decomposition. If you want to use the default number of iterations leave this set to ‘default’.
- svds_tolerance
The following allows you to adjust the convergence tolerance of the decomposition calculation. If you want to use the default tolerance leave this set to ‘default’.
Eigenvolumes File Options
- eig_val_fn_prefix
Relative path and name of the Eigenvalues.
- eig_vol_fn_prefix
Relative path and name of the Eigenvolumes.
- variance_fn_prefix
Relative path and name prefix of the calculated variance map.
Coefficient Options
- coeff_prealign
If you want to pre-align all of the particles to speed up the coefficient calculation, set the following to 1, otherwise the particles will be aligned during the computation.
Eigencoefficient File Options
- coeff_all_motl_fn_prefix
Relative path and name of the concatenated motivelist to project onto the Eigenvolumes. This can be a larger motivelist than the one used to calculate the D-Matrix and Eigenvolumes.
- coeff_fn_prefix
Relative path and name of the coefficients.
Clustering Options
- cluster_type
The following determines which algorithm will be used to cluster the determined Eigencoefficients. The valid options are K-means clustering, ‘kmeans’, Hierarchical Ascendent Clustering using a Ward Criterion, ‘hac’, and a Gaussian Mixture Model, ‘gaussmix’.
- coeff_idxs
Determines which coefficients are used to cluster. The format should be a semicolon-separated list that also supports ranges with a dash (-), for example 1-5;7;15-19 would select the first five coefficients, the seventh and the fifteenth through the nineteenth for classification. If it is left as “all” all coefficients will be used.
- num_classes
How many classes should the particles be clustered into.
Clustering File Options
- cluster_all_motl_fn_prefix
Relative path and name of the concatenated motivelist of the output classified particles.
Averaging File Options
- ref_fn_prefix
Relative path and name prefix of the reference volumes (e.g. ref_iter.em, the variable will be written as a string e.g. ref_fn_prefix=’sub-directory/ref’)
- weight_sum_fn_prefix
Relative path and name prefix of the partial weight files.
Example
scratch_dir="${PWD}"
local_dir=""
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="XXXINSTALLATION_DIRXXX/bin"
cluster_exec="${exec_dir}/classification/general/subtom_cluster"
par_eigcoeff_exec="${exec_dir}/classification/wmd/subtom_parallel_coeffs"
eigcoeff_exec="${exec_dir}/classification/wmd/subtom_join_coeffs"
eigvol_exec="${exec_dir}/classification/wmd/subtom_eigenvolumes_wmd"
preali_exec="${exec_dir}/classification/general/subtom_parallel_prealign"
par_xmatrix_exec="${exec_dir}/classification/wmd/subtom_parallel_dmatrix"
xmatrix_exec="${exec_dir}/classification/wmd/subtom_join_dmatrix"
sum_exec="${exec_dir}/classification/general/subtom_parallel_sums_cls"
avg_exec="${exec_dir}/classification/general/subtom_weighted_average_cls"
motl_dump_exec="${exec_dir}/MOTL/motl_dump"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
array_max="1000"
max_jobs="4000"
run_local="0"
skip_local_copy="1"
iteration="1"
num_dmatrix_prealign_batch="1"
num_dmatrix_batch="1"
num_coeff_prealign_batch="1"
num_coeff_batch="1"
num_avg_batch="1"
high_pass_fp="1"
high_pass_sigma="2"
low_pass_fp="12"
low_pass_sigma="3"
nfold="1"
tomo_row="7"
dmatrix_prealign=0
dmatrix_all_motl_fn_prefix="combinedmotl/allmotl"
dmatrix_fn_prefix="class/xmatrix_wmd"
ptcl_fn_prefix="subtomograms/subtomo"
dmatrix_ref_fn_prefix="ref/ref"
weight_fn_prefix="otherinputs/ampspec"
mask_fn="none"
num_svs='40'
svds_iterations='default'
svds_tolerance='default'
eig_val_fn_prefix="class/eigval_wmd"
eig_vol_fn_prefix="class/eigvol_wmd"
coeff_prealign="0"
coeff_all_motl_fn_prefix="combinedmotl/allmotl"
coeff_fn_prefix="class/coeff_wmd"
cluster_type="kmeans"
coeff_idxs="all"
num_classes=2
cluster_all_motl_fn_prefix="class/allmotl_wmd"
ref_fn_prefix="class/ref_wmd"
weight_sum_fn_prefix="class/wei_wmd"
subtom_eigenvolumes_wmd
Computes Singular Value Decomposition of D-Matrix and projects data on right Singular Vectors.
subtom_eigenvolumes_wmd(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'dmatrix_fn_prefix', dmatrix_fn_prefix ('class/dmatrix_wmd'),
'eig_val_fn_prefix', eig_val_fn_prefix ('class/eigval_wmd'),
'eig_vol_fn_prefix', eig_vol_fn_prefix ('class/eigvol_wmd'),
'variance_fn_prefix', variance_fn_prefix ('class/variance_wmd'),
'mask_fn', mask_fn ('none'),
'iteration', iteration (1),
'num_svs', num_svs (40),
'svds_iterations', svds_iterations ('default'),
'svds_tolerance', svds_tolerance ('default'))
Calculates num_svs
weighted projections of wedge-masked differences onto the
same number of determined Right-Singular Vectors, by means of the Singular Value
Decomposition of a previously calculated D-matrix, named as given by
dmatrix_fn_prefix
and iteration
to produce Eigenvolumes which can then
be used to determine which vectors can best influence classification. The
Eigenvolumes are also masked by the file specified by mask_fn
. The output
weighted Eigenvolume will be written out as specified by eig_vol_fn_prefix
,
iteration
and #, where the # is the particular Eigenvolume being written
out. The calculated Eigenvalues which correspond to the square of the singular
vectors are also written oun as given by eig_val_fn_prefix
and
iteration
, and the variance map of the data is written out as determined by
variance_fn_prefix
and iteration
. Two options svds_iterations
and
svds_tolerance
are also available to tune how svds is run. If the string
‘default’ is given for either the default values in svds will be used.
Example
subtom_eigenvolumes_wmd(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'dmatrix_fn_prefix', 'class/dmatrix', ...
'eig_val_fn_prefix', 'class/eigval', ...
'eig_vol_fn_prefix', 'class/eigvol', ...
'variance_fn_prefix', 'class/variance', ...
'mask_fn', 'class/class_mask.em', ...
'iteration', 1, ...
'num_svs', 40, ...
'svds_iterations', 'default', ...
'svds_tolerance', 'default')
See Also
subtom_join_coeffs
Combines coefficient batches into the final matrix.
subtom_join_coeffs(
'coeff_fn_prefix', coeff_fn_prefix ('class/coeff_wmd'),
'iteration', iteration (1),
'num_coeff_batch', num_coeff_batch (1))
Looks for partial chunks of the low-rank approximation coefficients of projected
particles with the file name given by coeff_fn_prefix
, iteration
and
# where # is from 1 to num_coeff_batch
, and combines them into a final
matrix of coefficients written out as described by coeff_fn_prefix
, and
iteration
.
Example
subtom_join_coeffs(...
'coeff_fn_prefix', 'class/coeff', ...
'iteration', 1, ...
'num_coeff_batch', 100)
See Also
subtom_join_dmatrix
Combines chunks of D-Matrix into the final matrix.
subtom_join_dmatrix(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'dmatrix_fn_prefix', dmatrix_fn_prefix ('class/dmatrix_wmd'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'mask_fn', mask_fn ('none'),
'iteration', iteration (1),
'num_dmatrix_batch', num_dmatrix_batch (1))
Looks for partial chunks of the D-matrix with the file name given by
dmatrix_fn_prefix
, iteration
, and # where # is from 1 to
num_dmatrix_batch
, and combines them into a final matrix of differences
written out as described by dmatrix_fn_prefix
and iteration
.
Example
subtom_join_dmatrix(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'dmatrix_fn_prefix', 'class/dmatrix', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'mask_fn', 'otherinputs/classification_mask.em', ...
'iteration', 1, ...
'num_dmatrix_batch', 100);
See Also
subtom_parallel_coeffs
Computes wedge-masked difference coefficients.
subtom_parallel_coeffs(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'dmatrix_fn_prefix', dmatrix_fn_prefix ('class/dmatrix_wmd'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'coeff_fn_prefix', coeff_fn_prefix ('class/coeff_wmd'),
'eig_val_fn_prefix', eig_val_fn_prefix ('class/eigval_wmd'),
'eig_vol_fn_prefix', eig_vol_fn_prefix ('class/eigvol_wmd'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'mask_fn', mask_fn ('none'),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'prealigned', prealigned (0),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'num_coeff_batch', num_coeff_batch (1),
'process_idx', process_idx (1))
Takes a batch subset of particles described by all_motl_fn_prefix
with
filenames given by ptcl_fn_prefix
, band-pass filters them as described by
high_pass_fp
, high_pass_sigma
, low_pass_fp
, and low_pass_sigma
,
optionally applies nfold
C-symmetry, and projects them onto the Eigenvolumes
specified by eig_vol_fn_prefix
. This determines a set of coefficients
describing a low-rank approximation of the data. A subset of this coefficient
matrix is written out based on coeff_fn_prefix
and process_idx
, with
there being num_coeff_batch
batches in total.
If apply_weight
is set to 1 the Eigenvolumes will be reweighted using the
correct weight of each particle as described by weight_fn_prefix
and
tomo_row
, then each particle will be read and projected in a loop. If
prealigned
is set to 1, then it is understood that the particles have been
prealigned beforehand and the alignment of the particles can be skipped to save
time. mask_fn
describes the mask used throughout classification and ‘none’
describes a default spherical mask.
Example
subtom_parallel_coeffs(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'dmatrix_fn_prefix', 'class/dmatrix', ...
'ptcl_fn_prefix', 'subtomograms/subtomo_ali', ...
'ref_fn_prefix', 'ref/ref', ...
'coeff_fn_prefix', 'class/coeff', ...
'eig_val_fn_prefix', 'class/eigval', ...
'eig_vol_fn_prefix', 'class/eigvol', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'mask_fn', 'otherinputs/classification_mask.em', ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'prealigned', 1, ...
'iteration', 1, ...
'tomo_row', 7, ...
'num_coeff_batch', 100, ...
'process_idx', 1)
See Also
subtom_parallel_dmatrix
Calculates chunks of the D-matrix for processing.
subtom_parallel_dmatrix(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'dmatrix_fn_prefix', dmatrix_fn_prefix ('class/dmatrix_wmd'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'mask_fn', mask_fn ('none'),
'high_pass_fp', high_pass_fp (0),
'high_pass_sigma', high_pass_sigma (0),
'low_pass_fp', low_pass_fp (0),
'low_pass_sigma', low_pass_sigma (0),
'nfold', nfold (1),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'prealigned', prealigned (0),
'num_dmatrix_batch', num_dmatrix_batch (1),
'process_idx', process_idx (1))
Aligns a subset of particles using the rotations and shifts in the file given by
all_motl_fn_prefix
and iteration
and then subtracts the particle from
the reference specified by ref_fn_prefix
and iteration
and places these
voxels of the difference as a 1-D row vector in a data sub-matrix which is
denoted as the D-matrix (See Heumann, et al. 2011 J. Struct. Biol.). The
particle and reference are also filtered by a bandpass filter specified by
high_pass_fp
, high_pass_sigma
, low_pass_fp
and low_pass_sigma
,
and optionally symmetrized with nfold
C-symmetry, before subtracted.The
reference is masked in Fourier space using the weight specified by
weight_fn_prefix
and tomo_row
. The subset of particles compared is
specified by the number of particles in the motive list and the number of
requested batches specified by num_dmatrix_batch
, with the specific subset
deteremined by process_idx
. The D-matrix chunk will be written out as given
by dmatrix_fn_prefix
, iteration
, and process_idx
. The location of
the particles is specified by ptcl_fn_prefix
. If prealigned
evaluates to
true as a boolean then the particles are assumed to be prealigned, which should
increase speed of computation of D-Matrix calculations. Particles in the
D-matrix will be masked by the file given by mask_fn
. If the string ‘none’
is used in place of mask_fn
, a default spherical mask is applied. This mask
should be a binary mask and only voxels within the mask are placed into the
D-matrix which can greatly speed up computations.
Example
subtom_parallel_dmatrix(
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'dmatrix_fn_prefix', 'class/dmatrix', ...
'ptcl_fn_prefix', 'subtomograms/subtomo_ali', ...
'ref_fn_prefix', 'ref/ref', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'mask_fn', 'combinedmotl/classification_mask.em', ...
'high_pass_fp', 1, ...
'high_pass_sigma', 2, ...
'low_pass_fp', 15, ...
'low_pass_sigma', 3, ...
'nfold', 1, ...
'iteration', 1, ...
'prealigned', 1, ...
'num_dmatrix_batch', 100, ...
'process_idx', 1)
See Also
Indices and tables
subTOM
SubTOM - Subvolume processing scripts with the TOM toolbox is a collection of scripts form a pipeline for subvolume alignment and averaging of electron cryo-tomography data.
B-Factor by Subsets
To estimate the B-factor in maps of low to intermediate resolution, Guinier plot analysis is unsuitable because the structure factor dominates the appearance and slope of the amplitude decay at resolutions up to 10 Angstroms.
Therefore another way to estimate the B-factor is to look at how the Resolution decays over smaller and smaller subsets of the particles that form each half-map. A linear function is fit to the reciprocal square of resolutions determined by Gold-standard FSCs against the natural log of asymmetric units.
The method is described in detail in Rosenthal, Henderson 2003, and here the
averaging and analysis functions subtom_maskcorrected_fsc_bfactor
,
subtom_parallel_sums_bfactor
, and subtom_weighted_average_bfactor
have
been slightly modified from their non-bfactor counterparts to first generate the
average of successively smaller subsets, with each subset being roughly half of
the subset before it until the subset would be less than 128 particles. Then the
resolution of each subset is determined and the B-factor is estimated and this
estimate B-factor can then be used to sharpen the final post-processed map.
subtom_b_factor_by_subsets
Estimates the B-Factor by determine the resolution of subsets of particles as described in Rosenthal, Henderson 2003.
This subtomogram averaging analysis script uses three MATLAB compiled scripts below:
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- sum_exec
Parallel Summing executable
- avg_exec
Weighted Averaging executable
- fsc_exec
FSC executable
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- array_max
The maximum number of jobs per cluster submission script. Cluster submission scripts work using the array feature common to queuing systems, and this value is the maximum array size used in a script. If the user requests more batches of processing than this value, then the submission scripts will be split into files of up to array_max jobs.
- max_jobs
The maximum number of jobs for alignment. If the number of batches / exceeds this value the script will immediately quit.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
Subtomogram Averaging Workflow Options
Parallelization Options
- iteration
The index of the reference to generate : input will be all_motl_fn_prefix_iteration.em (define as integer)
- num_avg_batch
The number of batches to split the parallel subtomogram averaging job into.
File Options
- all_motl_a_fn_prefix
Relative path and name prefix of the concatenated motivelist of all particles in the first half-map.
- all_motl_b_fn_prefix
Relative path and name prefix of the concatenated motivelist of all particles in the second half-map.
- ref_a_fn_prefix
Relative path and name prefix of the reference volumes of the first half-map.
- ref_b_fn_prefix
Relative path and name prefix of the reference volumes of the second half-map.
- ptcl_a_fn_prefix
Relative path and name prefix of the subtomograms that comprise the first half-map.
- ptcl_b_fn_prefix
Relative path and name prefix of the subtomograms that comprise the second half-map.
- weight_a_fn_prefix
Relative path and name prefix of the weight files for the first half-map.
- weight_b_fn_prefix
Relative path and name prefix of the weight files for the second half-map.
- weight_sum_a_fn_prefix
Relative path and name prefix of the partial weight files of the first half-map.
- weight_sum_b_fn_prefix
Relative path and name prefix of the partial weight files of the second half-map.
- output_fn_prefix
Relative path and prefix for the name of the output maps and figures.
Averaging Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
- iclass
Particles with that number in position 20 of motivelist will be added to new average (define as integer e.g. iclass=1). NOTES: Class 1 is ALWAYS added. Negative classes and class 2 are never added.
Mask Corrected FSC Workflow Options
File Options
- fsc_mask_fn
Relative or absolute path and name of the FSC mask.
- filter_a_fn
Relative or absolute path and name of the Fourier filter volume for the first half-map. If not using the option do_reweight just leave this set to “”
- filter_b_fn
Relative or absolute path and name of the Fourier filter volume for the second half-map. If not using the option do_reweight just leave this set to “”
FSC Options
- pixelsize
Pixelsize of the half-maps in Angstroms
- nfold
Symmetry to applied the half-maps before calculating FSC (1 is no symmetry)
- rand_threshold
The Fourier pixel at which phase-randomization begins is set automatically to the point where the unmasked FSC falls below this threshold.
- plot_fsc
Plot the FSC curves - 1 = yes, 0 = no
Reweighting Options
- do_reweight
Set to 1 to apply the externally calculated Fourier weights filter_A_fn and filter_B_fn to each half-map to reweight the final output map.
Example
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
sum_exec="${exec_dir}/alignment/subtom_parallel_sums_bfactor"
avg_exec="${exec_dir}/alignment/subtom_weighted_average_bfactor"
fsc_exec="${exec_dir}/analysis/b_factor_by_subsets/subtom_maskcorrected_fsc_bfactor"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
array_max="1000"
max_jobs="4000"
run_local="0"
iteration="1"
num_avg_batch="1"
all_motl_a_fn_prefix="even/combinedmotl/allmotl"
all_motl_b_fn_prefix="odd/combinedmotl/allmotl"
ref_a_fn_prefix="FSC/ref_a"
ref_b_fn_prefix="FSC/ref_b"
ptcl_a_fn_prefix="subtomograms/subtomo"
ptcl_b_fn_prefix="subtomograms/subtomo"
weight_a_fn_prefix="otherinputs/ampspec"
weight_b_fn_prefix="otherinputs/ampspec"
weight_sum_a_fn_prefix="FSC/wei_a"
weight_sum_b_fn_prefix="FSC/wei_b"
output_fn_prefix="FSC/ref_auto_b"
tomo_row="7"
iclass="0"
fsc_mask_fn="FSC/fsc_mask.em"
filter_a_fn=""
filter_b_fn=""
pixelsize=1
nfold=1
rand_threshold=0.8
plot_fsc=1
do_sharpen=1
box_gaussian=1
filter_mode=1
filter_threshold=0.143
plot_sharpen=1
do_reweight=0
subtom_maskcorrected_FSC_bfactor
Calculates “mask-corrected” FSC and sharpened refs.
subtom_maskcorrected_fsc_bfactor(
'ref_a_fn_prefix', ref_a_fn_prefix ('even/ref/ref'),
'ref_b_fn_prefix', ref_b_fn_prefix ('odd/ref/ref'),
'motl_a_fn_prefix', motl_a_fn_prefix ('even/combinedmotl/allmotl'),
'motl_b_fn_prefix', motl_b_fn_prefix ('odd/combinedmotl/allmotl'),
'fsc_mask_fn', fsc_mask_fn ('FSC/fsc_mask.em'),
'output_fn_prefix', output_fn_prefix ('FSC/ref'),
'filter_a_fn', filter_a_fn (''),
'filter_b_fn', filter_b_fn (''),
'do_reweight', do_reweight (0),
'do_sharpen', do_sharpen (0),
'plot_fsc', plot_fsc (0),
'plot_sharpen', plot_sharpen (0),
'filter_mode', filter_mode (1),
'pixelsize', pixelsize (1.0),
'nfold', nfold (1),
'filter_threshold', filter_threshold (0.143),
'rand_threshold', rand_threshold (0.8),
'box_gaussian', box_gaussian (1),
'iclass', iclass (0),
'iteration', iteration (1))
Takes in two references ref_a_fn_prefix
_#.em and ref_b_fn_prefix
_#.em
where # corresponds to iteration
and a FSC mask fsc_mask_fn
and
calculates a “mask-corrected” FSC. This works by randomizing the structure
factor phases beyond the point where the unmasked FSC curve falls below a given
threshold (by default 0.8) and calculating an additional FSC between these phase
randomized maps. This allows for the determination of the extra correlation
caused by effects of the mask, which is then subtracted from the normal masked
FSC curves. The curve will be saved as a Matlab figure and a PDF file, and if
plot_fsc
is true it will also be displayed.
The script can also output maps with the prefix output_fn_prefix
that have
been sharpened with b_factor
if do_sharpen
is turned on. This setting
has two threshold settings selected using filter_mode
, FSC (1) and pixel
(2). FSC allows you to use a FSC-value filter_threshold
as a cutoff for the
lowpass filter, while using pixels allows you to use an arbitrary resolution
cutoff in filter_threshold
. The sharpening curve will be saved as a Matlab
figure and a pdf file, and if plot_sharpen
is true it will also be
displayed.
This function estimates the B-factor to apply versus applying an ad-hoc B-factor by fitting a curve to the drop in resolution as the number of particles decreases. This is detailed in Rosenthal and Henderson, 2003, doi:10.1016/j.jmb.2003.07.013
Finally this script can also perform and output reweighted maps if
do_reweight
is true, and the pre-calculated Fourier weight volumes
filter_a_fn
and filter_b_fn
.
Example
subtom_maskcorrected_fsc_bfactor(...
'ref_a_fn_prefix', 'even/ref/ref', ...
'ref_b_fn_prefix', 'odd/ref/ref', ...
'motl_a_fn_prefix', 'even/combinedmotl', ...
'motl_b_fn_prefix', 'odd/combinedmotl', ...
'fsc_mask_fn', 'FSC/fsc_mask.em', ...
'output_fn_prefix', 'FSC/ref', ...
'filter_a_fn', '', ...
'filter_b_fn', '', ...
'do_reweight', 0, ...
'do_sharpen', 1, ...
'plot_fsc', 1, ...
'plot_sharpen', 1, ...
'filter_mode', 1, ...
'pixelsize', 1.35, ...
'nfold', 6, ...
'filter_threshold', 0.143, ...
'rand_threshold', 0.8, ...
'box_gaussian', 3, ...
'iclass', 0, ...
'iteration', 1)
See Also
subtom_parallel_sums_bfactor
Subsets version of parallel sums for finding B-factor.
subtom_parallel_sums_bfactor(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'ptcl_fn_prefix', ptcl_fn_prefix ('subtomograms/subtomo'),
'weight_fn_prefix', weight_fn_prefix ('otherinputs/ampspec'),
'weight_sum_fn_prefix, weight_sum_fn_prefix ('otherinputs/wei'),
'iteration', iteration (1),
'tomo_row', tomo_row (7),
'iclass', iclass (0),
'num_avg_batch', num_avg_batch (1),
'process_idx', process_idx (1))
Aligns a subset of particles using the rotations and shifts in
all_motl_fn_prefix
_#.em where # corresponds to iteration
in
num_avg_batch
chunks to make a raw particle sum ref_fn_prefix
_#_###.em
where # corresponds to iteration
and ### corresponds to process_idx
.
Fourier weight volumes with name prefix weight_fn_prefix
will also be
aligned and summed to make a weight sum weight_sum_fn_prefix
_#_###.em.
tomo_row
describes which row of the motl file is used to determine the
correct tomogram fourier weight file. iclass
describes which class outside
of one is included in the averaging.
The difference between this function and the other version of subtom_parallel_sums is that this function creates a number of subsets of the particle and weight sum subsets, so that smaller and smaller populations of data are summed, and these subsets can then be used to estimate the B-Factor of the structure.
Example
subtom_parallel_sums_bfactor(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', 'ref/ref', ...
'ptcl_fn_prefix', 'subtomograms/subtomo', ...
'weight_fn_prefix', 'otherinputs/ampspec', ...
'weight_sum_fn_prefix, 'otherinputs/wei', ...
'iteration', 1, ...
'tomo_row', 7, ...
'iclass', 0, ...
'num_avg_batch', 1, ...
'process_idx', 1)
See Also
subtom_weighted_average_bfactor
Joins and weights subsets of average subsets.
subtom_weighted_average_bfactor(
'all_motl_fn_prefix', all_motl_fn_prefix ('combinedmotl/allmotl'),
'ref_fn_prefix', ref_fn_prefix ('ref/ref'),
'weight_sum_fn_prefix', weight_sum_fn_prefix ('otherinputs/wei'),
'iteration', iteration (1),
'iclass', iclass (0),
'num_avg_batch', num_avg_batch (1))
Takes the num_avg_batch
parallel sum subsets with the name prefix
ref_fn_prefix
, the all_motl file with name prefix motl_fn_prefix
and
weight volume subsets with the name prefix weight_sum_fn_prefix
to generate
the final average, which should then be used as the reference for iteration
number iteration
. iclass
describes which class outside of one is
included in the final average and is used to correctly scale the average and
weights.
The difference between this function and the other version of subtom_weighted_average is that this function expects there to be a number of subsets of the average subsets, so that smaller and smaller populations of data are averaged, and these subsets can then be used to estimate the B-Factor of the structure.
Example
subtom_weighted_average_bfactor(...
'all_motl_fn_prefix', 'combinedmotl/allmotl', ...
'ref_fn_prefix', './ref/ref', ...
'weight_sum_fn_prefix', 'otherinputs/wei', ...
'iteration', 1, ...
'iclass', 0, ...
'num_avg_batch', 1)