Walkthrough for lattices on vesicles
This walkthrough shows the approach to process proteins that cover spheroidal vesicles following (approximately) a structured lattice. For demonstration, we use a highly binned tomogram with several viral capsides.
Contents
Data
The data has been made available by Florian Schur. This single tomogram is part of the data set used for the paper An atomic model of HIV-1 capsid-SP1 reveals structures regulating assembly and maturation by Schur FK, Obr M, Hagen WJ, Wan W, Jakobi AJ, Kirkpatrick JM, Sachse C, Kräusslich HG and Briggs JA. (2016)
Getting the data
You can download the data through the command:
dpkhelp.wiki.downloadExample('hiv');
In case this command fails, please try the direct link in linux:
wget https://wiki.dynamo.biozentrum.unibas.ch/w/doc/data/hiv/v17.rec
or the corresponding one in MacOS
curl -O https://wiki.dynamo.biozentrum.unibas.ch/w/doc/data/hiv/v17.rec
Viewing the data
The tomogram is severely binned, so it will probably fit in memory without major problems. We can just use our typical tool to quick check a tomogram.
dtmshow v17.rec
Annotation of a tomogram
We are interested in cropping the particles between the two layers of each one of the viruses.
DipoleSet models
In our approach, we will first use a single model to manually input the approximate centers and radiuses of all the viruses. We will use the model type dipoleSet. This model allows to describe each annotated virus as a dipole: we will mark the (approximate) center of the virus as the center property of a dipole. The radius of the virus will be described by an annotation of the north property of the dipole.
Notice that Dynamo includes several tools for vesicle models, so you might be surprised that for this particular case we go instead for this (somewhat obscurely named) dipoleSet type of model. This is just a question of convenience: the dipoleSet allows to quickly click a series of centers and radiuses and store them in a single model. That's all you need to describe vesicular geometry. Additionally, the dipoleSet includes some tools for quickly extracting particles from a multivesicular set, and it also offers a useful on-screen representation in the dtmslice tomogram browser.
When the model is active, we use the key c to mark the center of the current dipole, and n to mark the north. It will create a semitransparent sphere enclosing the completed dipole.
Corrections during picking
Further clicks on c or n will just move the center or the north of the same dipole. In order to click a further dipole, you need to click on enter and "open" the next dipole for clicking. If you forget to click on enter before annotating points on the next vesicle, you'll end up with situation like this:
Here, the user probably clicked on c for vesicle 2 in its real center, then clicked on n for the north point, and then moved to vesicle 3, clicking on c trying to mark the center. As a new dipole was not opened (i.e., the enter key was not pressed) when the vesicle 2 was finished, the center of vesicle 2 got misplaced. When this happens, simply press on h to hide the spheres and click the center of vesicle 2 back in its position. Then, press on enter to start the next vesicle.
Corrections after picking
It is a good idea to observe your annotation from the xor yperspectives (for this, press on keys x or y when the cursor is on the slide). Sometimes you might identify situations like this ne:
In such cases, you want to remove the dipole and click it again. The easiest way to do so is to:
- click on 'h' to hide the spheres, and
- secondary click on the wrongly created dipole (you have to click on the line itself) and select the Remove and prepare for reenter option in the menu that will popup.
When you are finished, don't forget to save the tomogram into the catalogue, using the disk icon (or the Active Model control)
Cropping the particles
We will describe a protocol to be carried from the command line, explaining each step. In short, we will visit each dipole in the model we just created, define a spherical vesicle centered on the dipole, use it to define a regular distribution of points on each vesicle, define positions pointing outwards and format everything as a table that can be used directly by Dynamo to operate an extraction and a subsequent alignment.
Extracting information from the catalogue
dcmodels hiv -tc dipoleSet -ws o -gm 1 % assumes that you only have one model file ds = o.models{1};
Create a table for each vesicle
Now we can loop on each dipole contained in the dipole set ds.
NDipoles = length(ds.dipoles); for i=1:NDipoles; v =dmodels.vesicle(); v.center = ds.dipoles{i}.center; % defines the radius in terms of the clicked dipole v.radius = norm( ds.dipoles{i}.north - ds.dipoles{i}.center); % create a separation distance in the vesicle v.separation = 20; % This parameter sets the cropping point a number of pixels away from the vesicle % this distance in pixels is measured in the normal direction. % here we just use the default value. v.crop_points_from_surface =0; % creates the crop points and respective angles v.updateCrop(); % extracts a table tv{i} = v.grepTable(); % marks the table assigning in the column 22 (foreseen for arbitrary use) % the number of the individual dipole tv{i}(:,22) = i; % provides information on the output disp(sprintf('Dipole %d will provide %d crop points',i,size(v.crop_points,2))); end
Merging the tables
Each dipole with number i generated a table stored as tv{i}, i.e., the entry i inside the cell array t. We want to generate a single table. Concatenating the table matrixes will not work, as it would have several particles with the same particle tags. We thus ask Dynamo to operate the table merging by itself, and renaming the particle tags. With the order:
tAll = dynamo_table_merge(tv,'linear_tags',1);
we concatenate the tables inside the cell array tv and set the particle tags (in column 1) to be named from 1 to the maximal number of particles to be extracted from all the vesicles. You can check the result through
dtinfo(tAll);
or graphically through: dtplot(tAll,'pf','oriented_positions');axis equal
There are several programs to depict tables. In this case we may find dpktbl.plot.disks useful, as it allows to depict the particles in a table with an associated extee can thus check how a given sidelength in pixels covers the vesicles for a given particle distribution.
dpktbl.plots.disks(tAll,'r',10);
Create a data folder
We can now create our typical data folder with dtcrop with a single command.
targetFolder = 'testData'; sidelength = 40; check = dtcrop(ds.cvolume.file,tAll,targetFolder,sidelength);
Note that that dtcrop will create a new file that contains only the particles that were actually cropped from the tomogram. Some particles were probably excluded if they were too close to the boundary of the tomogram. We can read this table file into memory.
finalTable = dread([targetFolder,'/crop.tbl']);
Checking the particles can be done with the interactive browsers ddbrowse or dgallery, but for a quick check we can use the dslices command:
dslices(targetFolder,'projy','*','t',finalTable,'align',1,'tag',1:10:700,'labels','on');
Here, we selected only one every ten tags to get a general idea on what is going on. The flag 'projy','*' creates a projection of the aligned particles along y, projecting all the slides, so that we can check that we are indeed greping the capsides.
We see that there are different problems: we are not hitting the center of the region of interest in many particles, and in some particles the region is out of scope.
Create a coarse average
We create a "coarse average" by just averaging all the particles with the orientations from the table, i.e, just the normals.
oa = daverage(targetFolder,'t',finalTable,'fc',1);
We can quickly check the result through:
dview(oa)
In this case, the area of interest is probably too low in the direction 'z'. We want to lift the cropping point of the particles a little bit "downwards", that is, in the negative normal direction.
Recroping the particles
Well, the protocol that we have described step by step in the previous steps can actually be completed with a single command: the method getTableFromEnclosingSpheres
A complete syntax description can be found it: help dmodels.dipoleSet.getTableFromEnclosingSpheres
We just want to extract a data folder where the particles are better centered and have bit more of space, so that particles stamming from spheres that are poorly defined still have a chance to get driven by a posterior alignment to the center of the box. If we initialize some variables:
sep = 20; % for 'separation' flag cdfs = -12; % for 'crop_distance_from_surface' flag, in pixels df = 'centeredData'; % for 'dataFolder' flag sidelength = 64;
the full task can be completed in one step:
[tnew,og] = ds.getTableFromEnclosingSpheres('sep',sep,'cdfs',cdfs,'df',df,'s', sidelength,'average',1,'mw',2);
Here the mw flag just calls for 2 cores to carry the average. If you don't have a license for the Parallel Toolbox, simply don't use the flag. Also, notice that two outputs are produced. The first one is just the table, (including also excluded particles!).
The second one includes a more detailed output, including a field averageOutput that corresponds to the output of the average operation on the newly created data folder centeredData. You can view it with the normal dview command:
dview(og.averageOutput.average);
Alignment project
The particles can now be aligned using the table and template that we just defined. We just need to write them into files
dwrite(dread([df,'/crop.tbl']),'centered.tbl');
and the density map hidden in the output: dwrite(og.averageOutput.average,'centered.em');
and create a project
pr = 'pcnt'; % pr is a variable, "pcnt" is a literal string
We create an empty project pointing to the right data, template and table:
dcp.new(pr,'d',df,'t','centered.tbl','template','centered.em','masks','default','show',0);
Putting the 'show' flag to zero prevented the dcp GUI from popping up, as we want to use this walkthrough as an example of how to create and manage projects from the command line. The basic tool is dvput, which is used to change one or several project parameters. A list of the parameters and their syntax can be printed with dvhelp or consulted through the GUI dhelp. For this particular project we suggest the set below:
dvput(pr,'ite_r1',4); dvput(pr,'dim_r1',32);
In the stage of finding general orientations and coarse centerings, we want to save computing time (without trading accuracy). Thus, we use the 'dim_r1' parameters tune to 32. As the particle boxes have a sidelength of 64 pixels, here we are asking Dynamo to bin each particle on the fly, just one time. A neighboorhod of 2x2x2 voxels will collapse into one single voxel, with the averaged intensity. By now, we are just interested in coarsely locating the particles, so that the speed-up in the alignemnt (8x) justifies the loss of accuracy.
Next, we need angular settings reflect that we trust the normals that we have found. This will save computation time and make the alignment more robust.
dvput(pr,'cr_r1',40); dvput(pr,'cs_r1',20); dvput(pr,'ir_r1',360); dvput(pr,'is_r1',40); dvput(pr,'rf_r1',4); dvput(pr,'rf_r1',2);
We let the particles shift, but not the the edges of the box:
dvput(pr,'lim_r1',[20,20,20]); dvput(pr,'limm_r1',1);
Some parameters re provided to control the execution. For projects to be run in parallel under Matlab, we need the 'destination' parameter tuned to value 'matlab_parfor'. We also provide a value for the number of cores to be used during the alignments (parameter 'cores') and the averaging steps (parameter matlab_workers_average).
dvput(pr,'dst','matlab_parfor','cores',2,'mwa',2);
Run project
As we would do the GUI, we have to check, unfold and run the project:
dvcheck pcent dvunfold pcent dvrun pcent
In scripted protocols
Advanced: A note for future projects: when your own scripts to run several projects created dynamically, you can put the name of the project into a variable (as we did above) and use the ( ) notation: dvcheck(pr); dvunfold(pr); dvrun(pr);
When used inside a script, you probably want to use the ; at the end of each line, to prevent flooding the screen with output.
=== Check results ===
We can check how the particles look like:
dslices('pcnt:data','projy','*','t','pcnt:rt:ite=3','align',1,'tag',1:10:700,'labels','on');
Here, 'pcnt:data' is a database query. it is just a shorthand to tell Dynamo to use the data folder associated with the project pcnt. The same trick is used to extract the refined table (rt) at iteration 3. Don't worry if you don't like this notation: you can just input the actual folder and file names.
The averages can be consulted with mapview. Here,we order the ddb command to extract the averages (a item) corresponding to iterations 0 and 4 and send the result to the browser mapview (flag -m)
ddb pcnt:a:ite=[0,4] -m
This browser is more or less complicated... you can tweak on it to see a depiction of how the average evolved from the first template to iteration number 4.
As a tip, when you know exactly which features you want to test visually, you can use the command line order dslices to directly produce the scene of interest:
dslices pcnt:a:ite=[0,4] -projy 44 -ns
Recropping the particles
You can create a new table excluding particles that are too close to each other:
tc = dpktbl.exclusionPerVolume('pcnt:rt:ite=4',15,22);
where 15 is a distance in pixels. The third parameter denotes the column in the table that denotes particles in the same objects. In this case, we have marked each virus with a different integer in column 22 of the table, and we pass this information to dpktbl.exclusionPerVolume.