subtom_preprocess
Aligns dose-fractionated data, sorts and stacks aligned frames, determines the defocus of the tilt-series using CTFFIND4, GCTF, or IMOD CTFPLOTTER and then dose-filters the tilt-series in prepartion for alignment using IMOD/eTomo.
Options
Directories
- scratch_dir
Absolute path to the folder with the input to be processed. Other paths are relative to this one.
- frame_dir
Relative path to the folder where the dose-fractionated movie frames are located.
Executables
- alignframes_exe
Absolute path to the IMOD alignframes executable. The directory of this will be used for the other IMOD programs used in the processing. Need version at least above 4.10.29
- ctffind_exe
Absolute path to the CTFFIND4 executable. Needs version at least above 4.1.13.
- gctf_exe
Absolute path to the GCTF executable. I wouldn’t use it because it rarely works but it seems a version of 1.06 sometimes doesn’t crash.
- exec_dir
Directory for subTOM executables
Memory Options
- mem_free
The amount of memory the job requires for alignment. This variable determines whether a number of CPUs will be requested to be dedicated for each job. At 24G, one half of the CPUs on a node will be dedicated for each of the processes (12 CPUs). At 48G, all of the CPUs on the node will be dedicated for each of the processes (24 CPUs).
- mem_max
The upper bound on the amount of memory the alignment job is allowed to use. If any of the processes request or require more memory than this, the queue will kill the process. This is more of an option for safety of the cluster to prevent the user from crashing the cluster requesting too much memory.
Other Cluster Options
- job_name
The job name prefix that will be used for the cluster submission scripts, log files, and error logs for the processing. Be careful that this name is unique because previous submission scripts, logs, and error logs with the same job name prefix will be overwritten in the case of a name collision.
- run_local
If the user wants to skip the cluster and run the job locally, this value should be set to 1.
File Options
- ts_fmt
The format string for the datasets to process. The string XXXIDXXXX will be replaced with the numbers specified between the range start_idx and end_idx.
The raw sum tilt-series will have the name format ts_fmt.st, or ts_fmt.mrc, an extended data file ts_fmt.{mrc,st}.mdoc and could possibly have an associated log ts_fmt.log.
Dose-fractionated movies of tilt-images are assumed to have the name format: ts_fmt_###_*.{mrc,tif} where ### is a running three-digit ID number for the tilt-image and * is the tilt-angle.
- start_idx
The first tilt-series to operate on.
- end_idx
The last tilt-series to operate on.
- idx_fmt
The format string for the tomogram indexes. Likely two or three digit zero padding or maybe just flat integers.
Beam Induced Motion Correction Options
- do_aligned
If you want to run alignframes to generate the non-dose-weighted tiltseries set this option to 1 and if you want to skip this step set this option to 0.
- do_doseweight
If you want to run alignframes to generate the dose-weighted tiltseries set this option to 1 and if you want to skip this step set this option to 0.
- do_gain_correction
Determines whether or not gain-correction needs to be done on the frames. Set to 1 to apply gain-correction during motion-correction, and 0 to skip it. Normally TIFF format frames will be saved with compression and will be unnormalized, and should be gain-corrected. MRC format frames are generally already saved with gain-correction applied during collection, so it can be skipped here.
A good rule of thumb, is if you have a dm4 file in your data you need to do gain-correction, and if you don’t see a dm4 file you do not.
- gainref_fn
The path to the gain-reference file, this will only be used if gain_correction is going to be applied.
- defects_fn
The path to the defects file, this is saved along with the gain-reference for unnormalized saved frames by SerialEM, and will only be used if gain-correction is going to be applied.
- align_bin
Binning to apply to the frames when calculating the alignment, if you are using super-resolution you may want to change this to 2. The defaults from IMOD would be 3 for counted data and 6 for super-resolution data. Multiple binnings can be tested and the best one will be used to generate the final sum.
- sum_bin
Binning to apply to the final sum. This is done using Fourier cropping as in MotionCorr and other similar programs. If you are using super-resolution you probably want to change this to 2, otherwise it should be set to 1.
- scale
Amount of scaling to apply to summed values before output. The default is 30 however serialEM applies one of 39.3?
- filter_radius2
Cutoff Frequency for the lowpass filter used in frame alignment. The unit is absolute spatial frequency which goes from 0 to 0.5 relative to the pixelsize of the input frames (not considering binning applied in alignment). The default from IMOD is 0.06. Multiple radii can be used and the best filter will be selected for the actually used alignment.
- filter_sigma2
Falloff for the lowpass filter used in frame alignment. Same units as above. The defaults from IMOD is 0.0086.
- shift_limit
Limit on distance to search for correlation peak in unbinned pixels. The default from IMOD is 20.
- do_refinement
If this is set to 1, alignframes will do an iterative refinement of the initially found frame alignment solution. The default in IMOD is to not do this refinement.
- refine_iterations
The maximum number of refinement iterations to run.
- refine_radius2
Cutoff Frequency for the lowpass filter used in refinement. The default in IMOD would be to use the same value used in alignment.
- refine_shift_stop
The amount of shift at which refinement will stop in unbinned pixels.
- truncate_above
Movies often contain hot pixels not removed from the pixel-defect mask either from x-rays or other factors and these throw off the later scaling of sums. Traditionally they would be removed in eTomo using the ccderaser command / step, but it has been found to go better to truncate them at the frame-alignment and summing step. To find a reasonable value to truncate above use the command ‘clip stats’ on several movies to find out where the values start to become outliers, it should be around 5-7 for 10 frame movies of about 3e/A^2 on the K2.
- use_gpu
If you want to use a GPU set this to 1, but be careful to not use both the cluster and the GPU as this is not supported.
- extra_opts
If you want to use other options to alignframes specify them here.
CTF Estimation Options
- apix
The pixel size of the raw movie frames if they exist, or the pixelsize of the “_aligned.st” stack if alignframes and dose-weighting is not being done. The actual pixelsize used in CTF estimation is apix * sum_bin.
- do_ctffind4
If this is set to 1, the defocus will be estimated with CTFFIND4.
- do_gctf
If this is set to 1, the defocus will be estimated with GCTF.
- do_ctfplotter
If this is set to 1, the defocus will be estimated with CTFPLOTTER.
- voltage_kev
The accelerating voltage of the microscope in KeV.
- cs
The spherical aberration of the microscope in mm.
- ac
The amount of amplitude contrast in the imaging system.
- tile_size
The size of tile to operate on.
- min_res
The lowest wavelength in Angstroms to allow in fitting (minimum resolution).
- max_res
The highest wavelength in Angstroms to allow in fitting (maximum resolution).
- min_res_ctfplotter
The lowest wavelength in Angstroms to allow in fitting in CTFPLOTTER.
- max_res_ctfplotter
The highest wavelength in Angstroms to allow in fitting in CTFPLOTTER.
- min_def
The lowest defocus in Angstroms to scan.
- max_def
The highest defocus in Angstroms to scan.
- def_step
The step size in Angstroms to scan defocus.
- astigmatism
The amount of astigmatism to allow in Angstroms.
- tilt_axis_angle
The tilt-axis angle of the tilt series. This is only needed if you are estimating the CTF with ctfplotter. You can find this value running the command ‘header’ on the raw sum tiltseries and looking at the first label (Titles) in the header.
Dose Filtering Options
- dose_per_tilt
The dose per micrograph in Electrons per square Angstrom.
Example
scratch_dir="${PWD}"
frame_dir="Frames"
alignframes_exe="$(which alignframes)"
ctffind_exe="$(which ctffind)"
gctf_exe="$(which Gctf)"
exec_dir="/net/dstore2/teraraid/dmorado/software/subTOM/bin"
mem_free="1G"
mem_max="64G"
job_name="subTOM"
run_local=1
ts_fmt="TS_XXXIDXXXX"
start_idx=1
end_idx=1
idx_fmt="%02d"
do_aligned=1
do_doseweight=1
do_gain_correction=1
gainref_fn="Frames/gainref.dm4"
defects_fn="Frames/defects.txt"
align_bin=1,2,3
sum_bin=1
scale=39.3
filter_radius2=0.167,0.125,0.10,0.06
filter_sigma2=0.0086
shift_limit=20
do_refinement=1
refine_iterations=5
refine_radius2=0.167
refine_shift_stop=0.1
truncate_above=7
use_gpu=0
extra_opts=''
apix=1
do_ctffind4=1
do_gctf=0
do_ctfplotter=1
voltage_kev=300.0
cs=2.7
ac=0.07
tile_size=512
min_res=30.0
max_res=5.0
min_res_ctfplotter=50.0
max_res_ctfplotter=10.0
min_def=10000.0
max_def=60000.0
def_step=100.0
astigmatism=1000.0
tilt_axis_angle=85.3
dose_per_tilt=3.5