Pipelines for Workshop Heidelberg 2018

From Dynamo
Jump to: navigation, search

These materials describe the procedural steps required for high resolution tomography. Materials on more basic aspects of Dynamo can be consulted here:


Pipeline

Input data

We will start with two tilt series. A drift correction has already been applied on each tilt using motioncorr2, so that each micrograph in a tilt series is an averaged movie. The pixel size is 2.7A (the tilt series have been binned by fourier cropping to ease the computation ), and the structures of interest form spherical lattices.

Procedural steps

Preprocessing steps

As we will initially work with already drift-corrected tilt series,

  • Defocus estimation with ctffind4
  • Alignment of tilt series with Dynamo
  • Exposure filter on the aligned tilt series.
  • Stripe-based CTF correction (with ctfphaseflip from Imod)
  • Creation of tomograms with Dynamo

These preprocessing operations that go from the tilt series to the tomograms can be used for any sample.

Tomogram management

This part is specific to the particular geometry of the proteins of interest.

  • Cataloguing the tomograms (archiving)
  • Annotation of vesicle positions.
  • Extraction of putative particles based on the vesicle positions.
  • Ellimination of repeated particles.

Subtomogram averaging

  • Extraction of true particles.
  • Splitting in odd and even data sets.
  • Independent alignment and resolution determination.

Material organization

The data corresponding to each tilt series is under the folders work/1 and work/2. Intermediate results corresponding to each tilt series will be stored in the corresponding folders.

The initial structure of the folder contains two items

ls work/1
rawStack.mrc
nominalTilts.tlt
The additional processing steps will gradually generate more elements pert tilt.

Copying data to your account

You will need to copy the provided data into you account:

cp -r /g/embl/workshop/teach11/data/work  .  

Matlab environment

Matlab provides a convenient syntax for manipulation of sets of paths.

Preprocessing

Each of the preprocessing steps can be computed separately for each tilt series. We have thus provided a set of functions, each one a wrapper that accepts as input one index that identifies a tilt series and creates an output with the right naming convention. They are intended to be callable on a loop like:

for i=1:2
   emboFunction('work',i);
end 

If new tilt series are added in order to increase the attainable resolution, you just need to apply the function on additional indices.

Also, the scripts contain the numerical parameters needed to run the corresponding executable. If you need to edit them, you should do a local copy of the script and run it

defocus estimation

We run the defocus estimation on the raw tilt series, previous to any transformation.

The function emboDefocusEstimation will run ctffind4 on the input data, creating the following items:

  • Input: work/i/rawStack
  • Output: work/i/imodDefocus.txt
for i=1:2
   emboDefocusEstimation('work',i,pixelSize);
end 

tilt series alignment

The gold beads in the raw stack will be detected and used to create a 3d model

  • Input: work/i/rawStack, work/i/nominalAngles.tlt
  • Output: work/X/aligner.AWF/alignedFullStack.mrc
for i=1:2
   emboAlignment('work',i,pixelSize);
end 

The parameters defined in emboAlignment concern mainly the size of the gold bead.

dose filtering

  • Input:
  • Output: work/X/doseStack.mrc
for i=1:2
   emboDoseFiltering('work',i);
end </pre

===  CTF correction of tilt series ===

The results of the dose filtering are sent to Imod's<tt>ctfphaseflip</tt>
* Input:    <tt>work/X/</tt>,<tt>work/X/doseStack.mrc</tt>
* Output: <tt>work/X/finalStacl</tt>

 <pre>for i=1:2
   emboPhaseFlipping('work',i);
end </pre

===  Reconstruction of tomograms ===

* Input:    <tt>work/X/</tt>,<tt>work/X/</tt>
* Output: <tt>work/X/aligner.AWF/reconstructionFullSize.mrc </tt>

<pre>for i=1:2
   emboReconstruction('work',i,);
end 

By defaut, tomograms of height 600 pixels are produced

Management of tomogram data

Now we need to organize the different tilt series computed independently into an object that indexes all of them: the catalogue.

Archiving of tomograms

dcm -create tomos
for i=1:N;
    dcm('c','tomos','at',[id,'/',num2str(i),'/tomogram.mrc']);   
end

We need to annotate the position and radius (roughly estimated) of each vesicle on each tomogram. For this, we open the catalogue that we just created through

cm -c tomos 

In order to increase the visibility of the objects inside the tomogram, we first use a 1x binning, which will create a proxy for each tomogram. After binning the data set, when right clicking on a tomogram item we will get the option of opening a binned version of the tomogram. Annotations will be however registered in the coordinates of the full sized tomogram.

Modelling of vesicles

Location of oversampled particles

Our strategy consists on creating many more particles than expected: we oversample the vesicles. This procedure is contained in the script emboDipolesToTable, that operates the catalogue management operations necessary to create a table that runs on all the particles, keeping track on the original tomogram.

[tableFinal,tomoListFile] = emboDipolesToTable('tomos',40);

Here, 40 is the number of pixels of separation between particles on the surface of each vesicle.

Extraction of particles

With the table and the file that indexes the tomograms, we extract the located positions into a data folder.

dtcrop(tomoListFile,'oversampled.tbl','oversampled2.7.Data',128); 

First average

We compute a first average to feed an alignment project. Although the average is relatively featureless, it has enough signal to attract reinforcement of coherent signal, and a few iteration rounds will be enough to show the basic features of the protein.

o = daverage('oversampling_5.2.Data','t','oversampling_5.2.Data/crop.tbl','fc',1);

Creation of a project

We create an alignment project.

dcp

Filtering overlapping particles

This procedure thus generates a table that contains the approximate coordinates of the particles. Our task will thus be to refine them with subtomogram averaging.

Subtomogram averaging

Once the particles have been located, subtomogram averaging can start. We divide the data set in two halves by creating two different tables.