subtom_extract_subtomograms
This script takes an input number of cores, and on each core extract one tomogram at a time as written in a specified row of the all motive list. Parallelization works by writing a start file upon openinig of a tomo, and a completion file. After tomogram extraction, it moves on to the next tomogram that hasn’t been started.
This tomogram extraction script uses one MATLAB compiled scripts below:
Options
Directories
- tomogram_dir
Absolute path to the folder where the tomograms are stored
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- mcr_cache_dir
Absolute path to MCR directory for the processing.
- exec_dir
Directory for executables
Variables
- extract_exe
Subtomogram extraction executable
- motl_dump_exe
MOTL dump executable
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
Subtomogram Extraction Workflow Options
File Options
- iteration
The iteration of the all particle motive list to extract from : input will be all_motl_fn_prefix_iteration.em (define as integer)
- all_motl_fn_prefix
Relative path to allmotl file from root folder.
- subtomo_fn_prefix
Relative path and filename for output subtomograms.
- stats_fn_prefix
Relative path and filename for stats .csv files.
The CSV format of the subtomogram stats is a single file for each tomogram with one line per particle in the tomogram with six columns. The particle columns are as follows:
Column |
Value |
---|---|
1 |
Particle Index (Motive List row 4) |
2 |
Mean value for the subtomogram |
3 |
Maximum value in the subtomogram |
4 |
Minimum value in the subtomogram |
5 |
Standard deviation of values in the subtomogram |
6 |
Variance of values in the subtomogram |
Tomogram Options
- tomo_row
Which row in the motl file contains the correct tomogram number. Usually row 5 and 7 both correspond to the correct value and can be used interchangeably, but there are instances when 5 contains a sequential ordered value starting from 1, while 7 contains the correct corresponding tomogram.
Extraction Options
- box_size
Size of subtomogram in pixels
- subtomo_digits
Leading zeros for subtomograms, for AV3, use 1. Other numbers are useful for DYNAMO.
- reextract
Set reextract to 1 if you want to force the program to re-extract subtomograms even if the stats file and the subtomograms already exist. If the stats file for the tomogram exists and is the correct size the whole tomogram will be skipped. If the subtomogram exists it will also be skipped, unless this option is true.
- preload_tomogram
Set preload_tomogram to 1 if you want to read the whole tomogram into memory before extraction. This is the fastest way to extract particles however the system needs to be able to have the memory to fit the whole tomogram into memory or otherwise it will crash. If it is set to 0, then either the subtomograms can be extracted using a memory-map to the data, or read directly from the file.
- use_tom_red
Set use_tom_red to 1 if you want to use the AV3/TOM function tom_red to extract particles. This requires that preload_tomogram above is set to 1. This is the original way to extract particles, but it seemed to sometimes produce subtomograms that were incorrectly sized. If it is set to 0 then an inlined window function is used instead.
- use_memmap
Set use_memmap to 1 to memory-map the tomogram and read subtomograms from this map. This appears to be a little slower than having the tomogram fully in memory without the massive memory footprint. However, it also appears to be slightly unstable and may crash unexpectedly. If it is set to 0 and preload_tomogram is also 0, then subtomograms will be read directly from the tomogram on disk. This also requires much less memory, however it appears to be extremely slow, so this only makes sense for a large number of tomograms being extracted on the cluster.
Example
tomogram_dir="/net/dstore2/teraraid/dmorado/subTOM_tutorial/data/tomos/bin8"
scratch_dir="${PWD}"
mcr_cache_dir="${scratch_dir}/mcr"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
extract_exe="${exec_dir}/alignment/subtom_extract_subtomograms"
motl_dump_exe="${exec_dir}/MOTL/motl_dump"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
run_local=0
iteration=1
all_motl_fn_prefix="combinedmotl/allmotl"
subtomo_fn_prefix="subtomograms/subtomo"
stats_fn_prefix="subtomograms/stats/tomo"
tomo_row=7
box_size=128
subtomo_digits=1
reextract=0
preload_tomogram=1
use_tom_red=0
use_memmap=0