Pipelines for Workshop Heidelberg 2018
These materials describe the procedural steps required for high resolution tomography. Materials on more basic aspects of Dynamo can be consulted here:
Contents
Pipeline
Input data
We will start with two tilt series. A drift correction has already been applied on each tilt using motioncorr2, so that each micrograph in a tilt series is an averaged movie. The pixel size is 2.7A (the tilt series have been binned by fourier cropping to ease the computation ), and the structures of interest form spherical lattices.
Procedural steps
Preprocessing steps
As we will initially work with already drift-corrected tilt series,
- Defocus estimation with ctffind4
- Alignment of tilt series with Dynamo
- Exposure filter on the aligned tilt series.
- Stripe-based CTF correction (with ctfphaseflip from Imod)
- Creation of tomograms with Dynamo
These preprocessing operations that go from the tilt series to the tomograms can be used for any sample.
Tomogram management
This part is specific to the particular geometry of the proteins of interest.
- Cataloguing the tomograms (archiving)
- Annotation of vesicle positions.
- Extraction of putative particles based on the vesicle positions.
- Ellimination of repeated particles.
Subtomogram averaging
- Extraction of true particles.
- Splitting in odd and even data sets.
- Independent alignment and resolution determination.
Material organization
The data corresponding to each tilt series is under the folders work/1 and work/2. Intermediate results corresponding to each tilt series will be stored in the corresponding folders.
The initial structure of the folder contains two items
ls work/1 rawStack.mrc nominalTilts.tlt
The additional processing steps will gradually generate more elements pert tilt.
Copying data to your account
You will need to copy the provided data into you account:
cp -r /g/embl/workshop/teach11/data/work .
Matlab environment
Matlab provides a convenient syntax for manipulation of sets of paths.
Preprocessing
Each of the preprocessing steps can be computed separately for each tilt series. We have thus provided a set of functions, each one a wrapper that accepts as input one index that identifies a tilt series and creates an output with the right naming convention. They are intended to be callable on a loop like:
for i=1:2 emboFunction('work',i); end
If new tilt series are added in order to increase the attainable resolution, you just need to apply the function on additional indices.
Also, the scripts contain the numerical parameters needed to run the corresponding executable. If you need to edit them, you should do a local copy of the script and run it
defocus estimation
We run the defocus estimation on the raw tilt series, previous to any transformation.
The function emboDefocusEstimation will run ctffind4 on the input data, creating the following items:
- Input: work/i/rawStack
- Output: work/i/imodDefocus.txt
for i=1:2 emboDefocusEstimation('work',i,pixelSize); end
tilt series alignment
The gold beads in the raw stack will be detected and used to create a 3d model
- Input: work/i/rawStack, work/i/nominalAngles.tlt
- Output: work/X/aligner.AWF/alignedFullStack.mrc
for i=1:2 emboAlignment('work',i,pixelSize); end
The parameters defined in emboAlignment concern mainly the size of the gold bead.
dose filtering
- Input:
- Output: work/X/doseStack.mrc
for i=1:2 emboDoseFiltering('work',i); end </pre === CTF correction of tilt series === The results of the dose filtering are sent to Imod's<tt>ctfphaseflip</tt> * Input: <tt>work/X/</tt>,<tt>work/X/doseStack.mrc</tt> * Output: <tt>work/X/finalStacl</tt> <pre>for i=1:2 emboPhaseFlipping('work',i); end </pre === Reconstruction of tomograms === * Input: <tt>work/X/</tt>,<tt>work/X/</tt> * Output: <tt>work/X/aligner.AWF/reconstructionFullSize.mrc </tt> <pre>for i=1:2 emboReconstruction('work',i,); end
By defaut, tomograms of height 600 pixels are produced
Management of tomogram data
Archiving of tomograms
dcm -create tomos for i=1:N; dcm('c','tomos','at',[id,'/',num2str(i),'/tomogram.mrc']); end
We need to annotate the position and radius (roughly estimated) of each vesicle on each tomogram. For this, we open the catalogue that we just created through
cm -c tomos
In order to increase the visibility of the objects inside the tomogram, we first use a 1x binning, which will create a proxy for each tomogram. After binning the data set, when right clicking on a tomogram item we will get the option of opening a binned version of the tomogram. Annotations will be however registered in the coordinates of the full sized tomogram.
Modelling of vesicles
Location of oversampled particles
Our strategy consists on creating many more particles than expected: we oversample the vesicles. This procedure is contained in the script emboDipolesToTable, that operates the catalogue management operations necessary to create a table that runs on all the particles, keeping track on the original tomogram.
[tableFinal,tomoListFile] = emboDipolesToTable('tomos',40);
Here, 40 is the number of pixels of separation between particles on the surface of each vesicle.
Extraction of particles
With the table and the file that indexes the tomograms, we extract the located positions into a data folder.
dtcrop(tomoListFile,'oversampled.tbl','oversampled2.7.Data',128);
First average
We compute a first average to feed an alignment project. Although the average is relatively featureless, it has enough signal to attract reinforcement of coherent signal, and a few iteration rounds will be enough to show the basic features of the protein.
o = daverage('oversampling_5.2.Data','t','oversampling_5.2.Data/crop.tbl','fc',1);
Creation of a project
We create an alignment project.
dcp
Filtering overlapping particles
This procedure thus generates a table that contains the approximate coordinates of the particles. Our task will thus be to refine them with subtomogram averaging.
Subtomogram averaging
Once the particles have been located, subtomogram averaging can start. We divide the data set in two halves by creating two different tables.