Programmatic control of alignment and reconstruction workflows
Article in progress. These functionalities will be released in the Dynamo 1.2 series.
Alignment and reconstruction workflows can be used with the GUI or on a totally programmatic manner. Programmatic control allows for batched reconstructions of sets of several tilt series.
- 1 Description
- 2 Syntax
- 2.1 dtsar command
- 2.2 Create and run workflow
- 2.3 Create a workflow for later use
- 2.4 Running preexisting workflow
- 3 Use for batching
This section describes qualitatively the organization the workflow and it elements. Actual syntax is described below in the Syntax
The workflow folder
Workflows in Dynamo are stored typically in folders (named with a capital case extension) that contain:
- a file that represents the workflow object itself. You kind think of it as a lightweight database that keeps track of every operation and parameter value.
- files and folders containing intermediate results.
Alignment and reconstruction workflow folders are marked with the extension .AWF
The workflow object
The workflow object is contained in a file called object.mat inside the .AWF folder.
If you need to perform low-level operations on the object, your can bring it to memory through dread. If the worklow is called, say, myTest, the command
w = dread('myTest.AWF');
w = dread('myTest.AWF/object.mat');
which will create a memory a variable w that represents the workflow object. Most users, however, do not need to operate directly on the object, and can use instead
- dtsa for invoking a GUI that controls a workflow and allows on-screen design of steps and parameter selection.
- dtsar (dynamo_tilt_series_alignment_run) for running a workflow, i.e. executing all or part of their steps in non-interactive mode.
Both commands are valid for creating new workflows or manipulating existing ones.
By default, intermediate and final results will be stored as fixed locations inside the workflow folder. This is the recommended behaviour for starting users.
Data does not need to be stored physically in the workflow folder.
Tilt series data
The full sized tilt series can be stored somewhere else and just linked to the workflow, as mrc or st file.
Original location of the tilt series
If you lose the overview of where the tilt series in a workflow is actually located, you can use general data location system of the workflow (explained elsewhere), or just the shortcut:
file = w.io.getMatrixFile()
where w is a workflow object currently in memory.
By default, the nominal tilt angles as delivered by the data acquisition software will be stored in the workflow folder, as file name nominalTiltAngles.tlt which is a text file. The order of the angles must correspond to order of tilt series stack.
Manually excluded tilt indices
Some micrographs might be detected to be of low quality by simple visual inspection (because of focusing errors, big shifts, etc). The identity of such micrographs is stored in a text file called discardedTiltIndices.txt. This file is updated when the workflow is handled through a GUI and the user excludes tilts manually. On the other hand, if the identity of the excluded tilts is known, the user can simply create a text file with the affected indices in the pre-established location.
Indices are always referred to the angle ordering of nominalTiltAngles.tlt
Algorithmically excluded tilt indices
Bear in mind that further micrographs in a tilt series might be excluded during the alignment procedure, even if the user didn't mark them for forceful exclusion. This happens when the gold bead indexing algorithm fails to recognize the identity of all the markers of a micrograph, or if the fitting error computed on the markers of a micrograph is considered too high. These indices will be stored under the file dynamicallyExcludedTiltIndices.txt
Command line acces to workflows can be used in different ways:
- Creating a workflow for posterior use.
- Accessing a workflow previously created
- Creation and execution: all the way from tilt series file to reconstructed tomogram files.
The dtsar command is used to design and run alignment workflows. Creation of new workflows is invoked with the flag -c (create), and opening a new workflow is done through the flag -o. In both cases, dtsar will immediately proceed to execute the created/opened workflow.
Create and run workflow
The minimal input that you would need to create a reconstruction from a raw tilt series would be:
- an file containing the stack of micrographs (arbitrarily called stack.mrc) , and
- a text file containing the nominal tilt angles of each one of the micrographs in the stack (arbitrarily called angles.tlt) .
Thus, a command like:
dtsar -c myTest -ts stack.mrc -nta angles.tlt
- create the workflow folder myTest.AWF, with its corresponding workflow object,
- link the to the workflow object the position of the tilt series anf tilt angles,
- run all the tasks of gold detection, trace indexing, stack alignment and reconstruction using default parameters, and
- store all the (intermediate and final) results in the workflow folder.
This is the minimal set of instructions that will go all the way from the raw data to down to a reconstruction.
Modifying the input data
Manually excluded tilts
If you know beforehand that some tilts represent low quality or unusable information, you can pass it to the dtsar command through the mdt flag:
dtsar -c myTest -ts stack.mrc -nta angles.tlt -mdt excludedIndices.txt
Note that the indices in the text tile excludedIndices.txt will be read and used to generate the workflow file myTest.AWF/excludedTiltIndices. It means that posterior edition of the file that you entered originally excludedIndices.txt will not change the behaviour of the workflow regarding excluding tilts. To exclude further tilts, you will have to
- edit the text file in the workflow folder myTest.AWF/excludedTiltIndices, or
- change manually the excluded tilt indices in the GUI of the workflow.
Default parameters in alignment workflows might not be suitable for your case. When you create a workflow for immediate execution, you may need to instruct dtsar to use different parameters.
Using a parameter file
The flag -parameterFile (-pf) allows passing a parameter file to dtsar,
dtsar -c myTest -ts stack.mrc -nta angles.tlt -pf myParameters.param
So that the workflow myTest will redefine its parameters with the values in myParameters.param before executing. A .param file is a text file organized as
parameterName1 parameterValue1 .... parameterNameN parameterValueN
where lines starting with the symbol # are considered to be comments and ignored.
Parameters not included in the file will be run using the default values.
Obtention of a parameter file
An easy way to get a text file with parameters is to use the -writeParameterFile (or wpf) flag of the dtsa. For instance:
dtsa -o otherWorkflow -writeParameterFile myFile.param --nogui
will write all the parameters found in folder otherWorkflow.AWF inside text file myFile.param
Using parameter flags
The names of the parameters to be modified can be passed directly to dtsar. If a parameter file is passed in the same time, the parameter that was passed explicitly with its own individual flag will prevail in case of conflict. Thus:
dtsar -c myTest -ts stack.mrc -nta angles.tlt -pf myParameters.param -workingBinning 3
will force a working binning factor of 3 even in case taht the parameter is redefined otherwise inside myParameters.param
Typical parameter modifications
Selecting the tasks
By default, dtsar will drive the workflow to complete all its tasks: gold bead detection, trace indexing, trace refinement, stack alignment, and tomogram reconstruction. You might want to carry only a subset of tasks using the flag '-tasks', followed by the a letter that identifies the task with this convention:
- gold bead detection
- trace indexing
- trace refinement
- stack alignment
- tomogram reconstruction
Thus, the command:
dtsar -c wf -ts file.mrc -nta tilts.txt -tasks difa
will skip the reconstruction task
Create a workflow for later use
All the previous syntax options can used with the addition of the flag -run 0. In such cases, the workflow will just be created, but not executed.
Running preexisting workflow
Existing workflows can be re-run by replacing the flag -c (create) with -o (open). For instance, the command:
dtsar -o oldTest
will open the workflow oldTest and execute all the tasks with the parameters found in disk. If the workflow is not found in disk, the command will not create a new one but issue an error instead. The previous order it will basically recompute and overwrite the previous contents. However, the previous syntax options are still valid, so that rerunning a workflow alignment with a new set of parameters can be performed through:
dtsar -o oldTest -pf newSet.param
Use for batching
dtsar can be used in a loop to quickly generate sets of reconstructions given a set of tilt series files. While this is in principle possible, real data sets must present difficulties that prevent direct automation:
- Need for parameter tuning.
- Presence of faulty micrographs on random indices in the tilt series.
Our approach to create an initial set of tomograms will follow conceptually this steps:
- choose one tilt series and create a tomographic alignment workflow on it
- determine workflow parameters that create a good detection, alignment and reconstruction on the chosen tilt series
- export the alignment parameters tuned for this tilt series into a text file
- loop on all the tilt series in the data set and create programmatically a workflow with the previous parameters
- loop on all the workflow folders, creating a binned version of the tilt series
- inspect all the tilts series in a loop, marking manually the micrographs to be discarded on each tilt series
- loop on all the workflow folders, launching dtsar on them.
Here, we create a repository out of a dataset download from the EMPIAR data base. The full dataset can be found on the Electron Microscopy Public Image Archive (EMPIAR) under EMPIAR-10164.
We prepared a set of 5 tilt series, the pixelsize is 2.7 angstrom. You can find the tilt series in:
Create a Dynamo repository
To create a repository, we need to define its location:
repo = '/Users/ppnavarro/Desktop/Stahlberg_Lab/Dynamo/HIV/download/repository/index.repo';
Of note, Dynamo repository has the extension .repo, and can be created by the following function:
rp = dpktomo.aux.repository.Repository(); rp.locationInDisk = repo; rp
rp is a MATLAB structure that coordinates, organizes and relates several data, files and structures displaying different formats that results from the cryo-ET pipeline. We can check the items so far listed in the repository:
By default the repository has several defined items:
root data batchFolder batchInfo tiltSeriesFolder rawFolder rawStack imodFolder dynamoFolder