Difference between revisions of "Programmatic control of alignment and reconstruction workflows"

From Dynamo
Jump to navigation Jump to search
 
(29 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
Alignment and reconstruction workflows can be used with the GUI or on a totally programmatic manner.  Programmatic control allows for batched reconstructions of sets of several tilt series.  
 
Alignment and reconstruction workflows can be used with the GUI or on a totally programmatic manner.  Programmatic control allows for batched reconstructions of sets of several tilt series.  
 
  
 
= Description =
 
= Description =
  
This section describes qualitatively the organization logic of the workflow and it elements. Actual syntax is described in the [[next section]]
+
This section describes qualitatively the organization the workflow and it elements. Actual syntax is described below in the [[Syntax]]
  
 
== The workflow folder ==
 
== The workflow folder ==
Line 20: Line 19:
 
The workflow object is contained in a file called <tt>object.mat</tt> inside the  <tt>.AWF</tt> folder.
 
The workflow object is contained in a file called <tt>object.mat</tt> inside the  <tt>.AWF</tt> folder.
  
If you need to perform low-level operations on the object, your can bring it to memory through <tt>dread</tt>
+
If you need to perform low-level operations on the object, your can bring it to memory through <tt>dread</tt>. If the worklow is called, say, <tt>myTest</tt>, the command
 
    
 
    
  <tt>w = dread(<my workflow folder>);</tt>
+
  <tt>w = dread('myTest.AWF');</tt>
  
 
or  
 
or  
  
<tt>w = dread(<my workflow folder>/object.mat);</tt>
+
<tt>w = dread('myTest.AWF/object.mat');</tt>
  
 
which will create a memory a variable <tt>w</tt> that represents the workflow object. Most users, however, do not need to operate directly on the object, and can use instead
 
which will create a memory a variable <tt>w</tt> that represents the workflow object. Most users, however, do not need to operate directly on the object, and can use instead
Line 49: Line 48:
 
If you lose the overview of where the tilt series in a workflow is actually located, you can use general data location system of the workflow (explained elsewhere), or just the shortcut:
 
If you lose the overview of where the tilt series in a workflow is actually located, you can use general data location system of the workflow (explained elsewhere), or just the shortcut:
 
   
 
   
<tt>file = w.io.getMatrixFile()</tt>
+
<tt>file = w.io.getMatrixFile()</tt>
  
where <tt>w</tt> is a workflow object.
+
where <tt>w</tt> is a workflow object currently in memory.
  
 
=== Tilt angles ===
 
=== Tilt angles ===
Line 78: Line 77:
 
== dtsar command ==
 
== dtsar command ==
  
 
+
The <tt>dtsar</tt> command is used to design and run alignment  workflows. Creation of new workflows is invoked with the flag <tt>-c</tt> (<tt>create</tt>), and opening a new workflow is done through the flag <tt>-o</tt>. In both cases, <tt>dtsar</tt> will immediately proceed to execute the created/opened workflow.
  
 
== Create and run workflow ==
 
== Create and run workflow ==
Line 91: Line 90:
  
 
will:   
 
will:   
* create the workflow folder <tt> myTest.AWF</tt>, with its corresponding workflow object,
+
* create the workflow folder <tt>myTest.AWF</tt>, with its corresponding workflow object,
 
* link the to the workflow object the position of the tilt series anf tilt angles,
 
* link the to the workflow object the position of the tilt series anf tilt angles,
 
* run all the tasks of gold detection, trace indexing, stack alignment and reconstruction using '''default''' parameters, and
 
* run all the tasks of gold detection, trace indexing, stack alignment and reconstruction using '''default''' parameters, and
 
* store all the (intermediate and final) results in the workflow folder.
 
* store all the (intermediate and final) results in the workflow folder.
  
This is the minimal set of instructions that will go all the way from the raw data to down to a reconstruction.
+
This is the minimal set of instructions that will go all the way from the raw data to down to a reconstruction.
 +
 
 +
=== Modifying the input data ===
 +
 
 +
==== Manually excluded tilts ====
 +
If you know beforehand that some tilts represent low quality or unusable information, you can pass it to the <tt>dtsar</tt> command through the <tt>mdt</tt> flag:
 +
 
 +
<tt>dtsar -c myTest  -ts stack.mrc -nta  angles.tlt -mdt excludedIndices.txt </tt>
 +
 
 +
Note that the indices in the text tile <tt>excludedIndices.txt </tt> will be read and used to generate the workflow file <tt>myTest.AWF/excludedTiltIndices</tt>. It means that posterior edition of the file that you entered originally <tt>excludedIndices.txt </tt> will not change the behaviour of the workflow regarding excluding tilts. To exclude further tilts, you will have to
 +
* edit the text file in the workflow folder <tt>myTest.AWF/excludedTiltIndices</tt>, or
 +
* change manually the excluded tilt indices in the GUI of the workflow.
  
=== Tuning the data input ===
+
=== Parameter modification ===
  
=== Tuning the parameters ===
+
Default parameters in alignment workflows might not be suitable for your case. When you create a workflow for immediate execution, you may need to instruct <tt>dtsar</tt> to use different parameters.
  
 
==== Using a parameter file ====
 
==== Using a parameter file ====
 +
 +
The flag <tt>-parameterFile</tt> (<tt>-pf</tt>) allows passing a parameter file to <tt>dtsar</tt>,
 +
 +
<tt>dtsar -c myTest  -ts stack.mrc -nta  angles.tlt -pf myParameters.param </tt>
 +
 +
So that the workflow <tt>myTest</tt> will redefine its parameters with the values in <tt>myParameters.param</tt> before executing. A <tt>.param</tt> file is a text file organized as
 +
 +
<pre> parameterName1 parameterValue1
 +
....
 +
parameterNameN parameterValueN</pre>
 +
 +
where lines starting with  the symbol <tt>#</tt> are considered to be comments and ignored.
 +
 +
Parameters not included in the file will be run using the default values.
 +
 +
===== Obtention of a parameter file =====
 +
 +
An easy way to get a text file with parameters is to use the <tt>-writeParameterFile</tt> (or <tt>wpf</tt>)  flag of the <tt>dtsa</tt>. For instance:
 +
 +
<tt>dtsa -o  otherWorkflow -writeParameterFile  myFile.param --nogui <tt>
 +
 +
will write all the parameters found in folder <tt> otherWorkflow.AWF</tt> inside text file <tt>myFile.param</tt>
  
 
==== Using parameter flags ====
 
==== Using parameter flags ====
  
=== Tuning the tasks to be performed ===
+
The names of the parameters to be modified can be passed directly to <tt>dtsar</tt>. If a parameter file is passed in the same time,  the parameter that was passed explicitly with its own individual flag will prevail in case of conflict. Thus:
 +
 +
<tt>dtsar -c myTest  -ts stack.mrc -nta  angles.tlt -pf myParameters.param -workingBinning 3</tt>
 +
 
 +
will force a working binning factor of 3 even in case taht the parameter is redefined otherwise inside <tt>myParameters.param</tt>
 +
 
 +
==== Typical parameter modifications ====
 +
 
 +
=== Selecting the tasks ===
 +
 
 +
By default, <tt>dtsar</tt> will drive the workflow to complete all its tasks: gold bead detection, trace indexing, trace refinement, stack alignment, and tomogram reconstruction. You might want to carry only a subset of tasks using the flag <tt>'-tasks'</tt>, followed by the a letter that identifies the task with this convention:
 +
* gold bead '''d'''etection
 +
* trace '''i'''ndexing
 +
* trace re'''f'''inement
 +
* stack '''a'''lignment
 +
* tomogram '''r'''econstruction
  
== Create a workflow for posterior use ==
+
Thus, the command:
 +
  <tt>dtsar -c wf -ts file.mrc -nta tilts.txt -tasks difa</tt>
 +
will skip the  reconstruction task
  
All the previous syntax options can used with the addition of the flag <tt>-run 0</tt>. In such cases, the workflow will just be created but not executed.
+
== Create a workflow for later use  ==
 +
 
 +
All the previous syntax options can used with the addition of the flag <tt>-run 0</tt>. In such cases, the workflow will just be created, but not executed.
  
 
== Running preexisting workflow ==
 
== Running preexisting workflow ==
  
 +
Existing workflows can be re-run by replacing the flag <tt>-c<tt> (create) with <tt>-o</tt> (open). For instance, the command:
 +
 +
<tt>dtsar -o oldTest</tt>
 +
 +
will open the workflow <tt>oldTest</tt>  and execute all the tasks with the parameters found in disk. If the workflow is not found in disk, the command will not create a new one but issue an error instead.
 +
The previous order it will basically recompute and overwrite the previous contents. However, the previous syntax options are still valid, so that rerunning a workflow alignment with a new set of parameters can be performed through:
 +
 +
<tt>dtsar -o oldTest -pf newSet.param</tt>
  
 
= Use for batching =
 
= Use for batching =
 +
 +
<tt>dtsar</tt> can be used in a loop to quickly generate sets of reconstructions given a set of tilt series files. While this is in principle possible, real data sets must present difficulties that prevent direct automation:
 +
# Need for parameter tuning.
 +
# Presence of faulty micrographs on random indices in the tilt series.
  
 
== Recommended approach ==
 
== Recommended approach ==
  
== Use through the repository system ==
+
Our approach to create an initial set of tomograms will follow conceptually this steps:
 +
 
 +
# choose one tilt series and create a tomographic alignment workflow on it
 +
# determine workflow parameters that create a good detection, alignment and reconstruction on the chosen tilt series
 +
# export the alignment parameters tuned for this tilt series into a text file
 +
# loop on all the tilt series in the data set and create programmatically a workflow with the previous parameters
 +
# loop on all the workflow folders, creating a binned version of the tilt series 
 +
# inspect all the tilts series in a loop, marking manually the micrographs to be discarded on each tilt series
 +
# loop on all the workflow folders, launching <tt>dtsar</tt> on them.
 +
 
 +
== Repository system ==
 +
 
 +
Here, we create a repository out of a dataset download from the EMPIAR data base. The full dataset can be found on the Electron Microscopy Public Image Archive (EMPIAR) under [https://www.ebi.ac.uk/pdbe/emdb/empiar/entry/10164/ EMPIAR-10164].
 +
 
 +
=== Data ===
 +
We prepared a set of 5 tilt series, the pixelsize is 2.7 angstrom. You can find the tilt series in:
 +
 
 +
<tt>~/data/tutorial_repository</tt>
 +
 
 +
=== Create a ''Dynamo repository'' ===
 +
 
 +
To create a repository, we need to define its location:
 +
 
 +
<tt>repo = '/Users/ppnavarro/Desktop/Stahlberg_Lab/Dynamo/HIV/download/repository/index.repo';</tt>
 +
 
 +
Of note, ''Dynamo repository'' has the extension <tt>.repo</tt>, and can be created by the following function:
 +
 
 +
<tt>rp = dpktomo.aux.repository.Repository();</tt>
 +
<tt>rp.locationInDisk = repo;</tt>
 +
<tt>rp</tt>
 +
 
 +
<tt>rp</tt> is a MATLAB structure that coordinates, organizes and relates several data, files and structures displaying different formats that results from the cryo-ET pipeline. We can check the items so far listed in the repository:
 +
 
 +
<tt>rp.components.list();</tt>
 +
 
 +
By default the repository has several defined items:
 +
 
 +
<tt>root</tt>
 +
<tt>data</tt>
 +
<tt>batchFolder</tt>
 +
<tt>batchInfo</tt>
 +
<tt>tiltSeriesFolder</tt>
 +
<tt>rawFolder</tt>
 +
<tt>rawStack</tt>
 +
<tt>imodFolder</tt>
 +
<tt>dynamoFolder</tt>

Latest revision as of 17:35, 10 July 2019

Article in progress. These functionalities will be released in the Dynamo 1.2 series.

Alignment and reconstruction workflows can be used with the GUI or on a totally programmatic manner. Programmatic control allows for batched reconstructions of sets of several tilt series.

Description

This section describes qualitatively the organization the workflow and it elements. Actual syntax is described below in the Syntax

The workflow folder

Workflows in Dynamo are stored typically in folders (named with a capital case extension) that contain:

  • a file that represents the workflow object itself. You kind think of it as a lightweight database that keeps track of every operation and parameter value.
  • files and folders containing intermediate results.

Alignment and reconstruction workflow folders are marked with the extension .AWF

The workflow object

The workflow object is contained in a file called object.mat inside the .AWF folder.

If you need to perform low-level operations on the object, your can bring it to memory through dread. If the worklow is called, say, myTest, the command

w = dread('myTest.AWF');

or

w = dread('myTest.AWF/object.mat');

which will create a memory a variable w that represents the workflow object. Most users, however, do not need to operate directly on the object, and can use instead

  • dtsa for invoking a GUI that controls a workflow and allows on-screen design of steps and parameter selection.
  • dtsar (dynamo_tilt_series_alignment_run) for running a workflow, i.e. executing all or part of their steps in non-interactive mode.

Both commands are valid for creating new workflows or manipulating existing ones.

Results

By default, intermediate and final results will be stored as fixed locations inside the workflow folder. This is the recommended behaviour for starting users.

Input data

Data does not need to be stored physically in the workflow folder.

Tilt series data

The full sized tilt series can be stored somewhere else and just linked to the workflow, as mrc or st file.

Original location of the tilt series

If you lose the overview of where the tilt series in a workflow is actually located, you can use general data location system of the workflow (explained elsewhere), or just the shortcut:

file = w.io.getMatrixFile()

where w is a workflow object currently in memory.

Tilt angles

By default, the nominal tilt angles as delivered by the data acquisition software will be stored in the workflow folder, as file name nominalTiltAngles.tlt which is a text file. The order of the angles must correspond to order of tilt series stack.

Manually excluded tilt indices

Some micrographs might be detected to be of low quality by simple visual inspection (because of focusing errors, big shifts, etc). The identity of such micrographs is stored in a text file called discardedTiltIndices.txt. This file is updated when the workflow is handled through a GUI and the user excludes tilts manually. On the other hand, if the identity of the excluded tilts is known, the user can simply create a text file with the affected indices in the pre-established location.

Indices are always referred to the angle ordering of nominalTiltAngles.tlt

Algorithmically excluded tilt indices

Bear in mind that further micrographs in a tilt series might be excluded during the alignment procedure, even if the user didn't mark them for forceful exclusion. This happens when the gold bead indexing algorithm fails to recognize the identity of all the markers of a micrograph, or if the fitting error computed on the markers of a micrograph is considered too high. These indices will be stored under the file dynamicallyExcludedTiltIndices.txt

Syntax

Command line acces to workflows can be used in different ways:

  • Creating a workflow for posterior use.
  • Accessing a workflow previously created
  • Creation and execution: all the way from tilt series file to reconstructed tomogram files.

dtsar command

The dtsar command is used to design and run alignment workflows. Creation of new workflows is invoked with the flag -c (create), and opening a new workflow is done through the flag -o. In both cases, dtsar will immediately proceed to execute the created/opened workflow.

Create and run workflow

The minimal input that you would need to create a reconstruction from a raw tilt series would be:

  • an file containing the stack of micrographs (arbitrarily called stack.mrc) , and
  • a text file containing the nominal tilt angles of each one of the micrographs in the stack (arbitrarily called angles.tlt) .

Thus, a command like:

dtsar -c myTest  -ts stack.mrc -nta  angles.tlt 

will:

  • create the workflow folder myTest.AWF, with its corresponding workflow object,
  • link the to the workflow object the position of the tilt series anf tilt angles,
  • run all the tasks of gold detection, trace indexing, stack alignment and reconstruction using default parameters, and
  • store all the (intermediate and final) results in the workflow folder.

This is the minimal set of instructions that will go all the way from the raw data to down to a reconstruction.

Modifying the input data

Manually excluded tilts

If you know beforehand that some tilts represent low quality or unusable information, you can pass it to the dtsar command through the mdt flag:

dtsar -c myTest  -ts stack.mrc -nta  angles.tlt -mdt excludedIndices.txt 

Note that the indices in the text tile excludedIndices.txt will be read and used to generate the workflow file myTest.AWF/excludedTiltIndices. It means that posterior edition of the file that you entered originally excludedIndices.txt will not change the behaviour of the workflow regarding excluding tilts. To exclude further tilts, you will have to

  • edit the text file in the workflow folder myTest.AWF/excludedTiltIndices, or
  • change manually the excluded tilt indices in the GUI of the workflow.

Parameter modification

Default parameters in alignment workflows might not be suitable for your case. When you create a workflow for immediate execution, you may need to instruct dtsar to use different parameters.

Using a parameter file

The flag -parameterFile (-pf) allows passing a parameter file to dtsar,

dtsar -c myTest  -ts stack.mrc -nta  angles.tlt -pf myParameters.param 

So that the workflow myTest will redefine its parameters with the values in myParameters.param before executing. A .param file is a text file organized as

 parameterName1 parameterValue1
....
parameterNameN parameterValueN

where lines starting with the symbol # are considered to be comments and ignored.

Parameters not included in the file will be run using the default values.

Obtention of a parameter file

An easy way to get a text file with parameters is to use the -writeParameterFile (or wpf) flag of the dtsa. For instance:

dtsa -o  otherWorkflow -writeParameterFile  myFile.param --nogui 

will write all the parameters found in folder otherWorkflow.AWF inside text file myFile.param

Using parameter flags

The names of the parameters to be modified can be passed directly to dtsar. If a parameter file is passed in the same time, the parameter that was passed explicitly with its own individual flag will prevail in case of conflict. Thus:

dtsar -c myTest  -ts stack.mrc -nta  angles.tlt -pf myParameters.param -workingBinning 3

will force a working binning factor of 3 even in case taht the parameter is redefined otherwise inside myParameters.param

Typical parameter modifications

Selecting the tasks

By default, dtsar will drive the workflow to complete all its tasks: gold bead detection, trace indexing, trace refinement, stack alignment, and tomogram reconstruction. You might want to carry only a subset of tasks using the flag '-tasks', followed by the a letter that identifies the task with this convention:

  • gold bead detection
  • trace indexing
  • trace refinement
  • stack alignment
  • tomogram reconstruction

Thus, the command:

dtsar -c wf -ts file.mrc -nta tilts.txt -tasks difa

will skip the reconstruction task

Create a workflow for later use

All the previous syntax options can used with the addition of the flag -run 0. In such cases, the workflow will just be created, but not executed.

Running preexisting workflow

Existing workflows can be re-run by replacing the flag -c (create) with -o (open). For instance, the command:

dtsar -o oldTest 

will open the workflow oldTest and execute all the tasks with the parameters found in disk. If the workflow is not found in disk, the command will not create a new one but issue an error instead. The previous order it will basically recompute and overwrite the previous contents. However, the previous syntax options are still valid, so that rerunning a workflow alignment with a new set of parameters can be performed through:

dtsar -o oldTest -pf newSet.param

Use for batching

dtsar can be used in a loop to quickly generate sets of reconstructions given a set of tilt series files. While this is in principle possible, real data sets must present difficulties that prevent direct automation:

  1. Need for parameter tuning.
  2. Presence of faulty micrographs on random indices in the tilt series.

Recommended approach

Our approach to create an initial set of tomograms will follow conceptually this steps:

  1. choose one tilt series and create a tomographic alignment workflow on it
  2. determine workflow parameters that create a good detection, alignment and reconstruction on the chosen tilt series
  3. export the alignment parameters tuned for this tilt series into a text file
  4. loop on all the tilt series in the data set and create programmatically a workflow with the previous parameters
  5. loop on all the workflow folders, creating a binned version of the tilt series
  6. inspect all the tilts series in a loop, marking manually the micrographs to be discarded on each tilt series
  7. loop on all the workflow folders, launching dtsar on them.

Repository system

Here, we create a repository out of a dataset download from the EMPIAR data base. The full dataset can be found on the Electron Microscopy Public Image Archive (EMPIAR) under EMPIAR-10164.

Data

We prepared a set of 5 tilt series, the pixelsize is 2.7 angstrom. You can find the tilt series in:

~/data/tutorial_repository

Create a Dynamo repository

To create a repository, we need to define its location:

repo = '/Users/ppnavarro/Desktop/Stahlberg_Lab/Dynamo/HIV/download/repository/index.repo';

Of note, Dynamo repository has the extension .repo, and can be created by the following function:

rp = dpktomo.aux.repository.Repository();
rp.locationInDisk = repo;
rp

rp is a MATLAB structure that coordinates, organizes and relates several data, files and structures displaying different formats that results from the cryo-ET pipeline. We can check the items so far listed in the repository:

rp.components.list();

By default the repository has several defined items:

root
data
batchFolder
batchInfo
tiltSeriesFolder
rawFolder
rawStack
imodFolder
dynamoFolder