Particle extraction

From Dynamo
Jump to navigation Jump to search

Particle extraction is the process of using a set of positions (and possibly orientations) defined on a set of tomograms to create a set of subtomograms files. This set is represented by a data folder containing one file for each cropped subtomogram, and possibly a table retaining the metadata. Data folder and table can directly be used for subtomogram alignment


Results of a particle extraction

Users might use their own protocols (as extracting particles separately in different experiments and merging the sets afterwards). Dynamo uses a single convention for the results of an extraction:

Data folder

Particle extraction should produce a folder with all the particles. For very large particle sets (~>20K), this folder can be formatted as a dBox folder. In general the particles are named with tags starting with 1, but some integers might be missing: they correspond to particles that were not generated, because they were too close to the boundaries of the tomogram and cropping them would implying reading out of scope.

The particles in this data folder are never rotated or shifted. Any rotation or shift is coded into the resulting table.

Table

A single table is produced to index all the particles in the data folder. This table refers only to the particles actually in the data folder, i.e., the tags (column 1 in table) will only include the ones that correspond to an particle file actually available in the generated data folder. This table is stored inside the data folder with the name crop.tbl.

Volume table index

The volume table index file is generated when several tomograms have been used. It is generated inside the data folder with the name indices_column20.doc and its a text file with two columns. Each row is an integer number and a filename. The numbers are the tomogram number used in the column 20 of the produced table crop.tbl. This file, jointly with the table, makes possible to track from which tomogram a particle has been cropped.

Using the catalogue

Select the tomograms in which you have models containing particles and click on the menu tab Crop particles. A new GUI will popup, with a list of models. You can then select the models that already contain crop points). Then click on Create list. This will show you an automatically created volume list file. This is just for information, no need to do anything on it. Click then on -> Crop particles, getting a GUI where you can select the sidelength in pixels, and start the actual cropping.

It will produce a data folder which will contain the cropped particles and some additional files:

  • a table file called crop.tbl.
    Column 20 assigns the particle to a particular tomogram in the volume list file.
    Column 21 assigns the particle to a particular model in the volume list file.
  • a file called ‎‎indices_column20.doc doc file with the full name of each tomogram coded by an integer in column 20 of crop.tbl
  • a file called ‎‎indices_column21.doc with the full name of each model coded by an integer in column 21 of crop.tbl

Extract from different tomograms

You can use the coordinates picked in a set of tomogram to extract particles on a different set of tomograms. This can be done automatically using the catalogue to explicite pairs of tomograms, or you can do it manually using dtcrop:

 dtcrop <tomogram_table_map.doc> <tableForAllTomograms>  <outputfolder> <sidelength> 

You could create a table with the previous tomograms in the tomogram-table map file , and then edit the names of the tomogram files. If you use this option, the number that identifies the tomogram in tomogram_table_map.doc should match column 20 of the table. If you have a table for each tomogram, you can use dtmerge to create a single one, for instance:

tableGlobal = dtmerge(cellOfTables,'linear_tags',1); 

Here, cellOfTables should be a cell array {} containing a table at each entry.

From the command line

In order to have a direct insight on how the cropping is performed, you can ignore the catalogue framework and just use the command line order dtcrop . Help on command:dtcrop

For single tomograms, all you need is table. When cropping simultaneously from different tomograms into the same data folder, there are two options:

  1. Using a volume list file
    Here, the first argument of dtcrop is the name of a volume list file, the second is the code word reorder.
    In this case, the volume list file contains all the cropping information in a text file: a list of tomogram files, and a list of models or tables to be used on each tomogram. For instance:
    dtcrop myfile.vll reorder myDataFolder 64
  2. A table and a volume table index file
    In this case, the table stores explicitely the coordinates of the particles (cols 24 to 26), and in column 20 it stores an integer. The volume table index file is a text file that contains on each row an index and a tomogram file name.
    dtcrop myMap.doc myTable.tbl myDataFolder 64

On each model separately

When you are manually creating a model in one of the browsers, the real workflow in an actual project involving many tomograms and modeles would be to just click the necessary points to define the user input of the model, store it in the catalogue, and move to the next model. Nevertheless, on the first model that you create with a set of parameters it is frequently useful to perform a particle extraction just to check that everything is doing what you think is doing.

If you are working in dtmview, dtmslice, Montage or any other tomogram browser connected with the model pool, you just need to select the model as the "actual" in the model pool menu tab (or right clicking on any point in the model), click on the update table points option and then on crop points on active model. This will bring you to the dtcrop GUI already pointing to the correct tomogram and to a temporary table containing the crop_points and crop_angles induces by your model.


Non-integer positions in tomogram

Tomogram positions in a table are indicated by the addition of columns 4 to 6 (shifts from physical subtomogram center) and 24 to 26 (position of the physical center of the subtomogram). These numbers are defined with respect to an origin (0,0,0) located on the bottom corner of the voxel of the tomogram pixel indexed as [1,1,1] (and centered at coordinate (0.5,0.5,0.5) in this system of coordinates ).

This convention, explained here, ensures that particles are cropped without any interpolation from the original tomogram. The intensities of tomogram voxels will fill verbatim the voxels of the cropped particle. If the center of the central voxel of a particle does not correspond to the center of a voxel of a tomogram, the general policy in Dynamo is to register the distance between the two centers and store it as a "shift" in columns 4 to 6 of a table.