Particle extraction

From Dynamo
Jump to navigation Jump to search

Particle extraction is the process of using a set of positions (and possibly orientations) defined on a set of tomograms to create a set of subtomograms files. This set is represented by a data folder containing one file for each cropped subtomogram, and possibly a table retaining the metadata. Data folder and table can directly be used for subtomogram alignment


Results of a particle extraction

Users might use their own protocols (as extracting particles separately in different experiments and merging the sets afterwards). Dynamo uses a single convention for the results of an extraction:

Data folder

Particle extraction should produce a folder with all the particles. For very large particle sets (~>20K), this folder can be formatted as a dBox folder. In general the particles are named with tags starting with 1, but some integers might be missing: they correspond to particles that were not generated, because they were too close to the boundaries of the tomogram and cropping them would implying reading out of scope.

The particles in this data folder are never rotated or shifted. Any rotation or shift is coded into the resulting table.

Table

A single table is produced to index all the particles in the data folder. This table refers only to the particles actually in the data folder, i.e., the tags (column 1 in table) will only include the ones that correspond to an particle file actually available in the generated data folder. This table is stored inside the data folder with the name crop.tbl.

Volume table index

The volume table index file is generated when several tomograms have been used. It is generated inside the data folder with the name indices_column20.doc and its a text file with two columns. Each row is an integer number and a filename. The numbers are the tomogram number used in the column 20 of the produced table crop.tbl. This file, jointly with the table, makes possible to track from which tomogram a particle has been cropped.

Using the catalogue

Select the tomograms in which you have models containing particles and click on the menu tab Crop particles. A new GUI will popup, with a list of models. You can then select the models that already contain crop points). Then click on Create list. This will show you an automatically created volume list file. This is just for information, no need to do anything on it. Click then on -> Crop particles, getting a GUI where you can select the sidelength in pixels, and start the actual cropping.

It will produce a data folder which will contain the cropped particles and some additional files:

  • a table file called crop.tbl.
    Column 20 assigns the particle to a particular tomogram in the volume list file.
    Column 21 assigns the particle to a particular model in the volume list file.
  • a file called ‎‎indices_column20.doc doc file with the full name of each tomogram coded by an integer in column 20 of crop.tbl
  • a file called ‎‎indices_column21.doc with the full name of each model coded by an integer in column 21 of crop.tbl

Extract from different tomograms

You can use the coordinates picked in a set of tomogram to extract particles on a different set of tomograms. This can be done automatically using the catalogue to explicite pairs of tomograms, or you can do it manually using dtcrop:

 dtcrop <tomogram_table_map.doc> <tableForAllTomograms>  <outputfolder> <sidelength> 

You could create a table with the previous tomograms in the tomogram-table map file , and then edit the names of the tomogram files. If you use this option, the number that identifies the tomogram in tomogram_table_map.doc should match column 20 of the table. If you have a table for each tomogram, you can use dtmerge to create a single one, for instance:

tableGlobal = dtmerge(cellOfTables,'linear_tags',1); 

Here, cellOfTables should be a cell array {} containing a table at each entry.

From the command line

In order to have a direct insight on how the cropping is performed, you can ignore the catalogue framework and just use the command line order dtcrop . Help on command:dtcrop

For single tomograms, all you need is table. When cropping simultaneously from different tomograms into the same data folder, there are two options:

  1. Using a volume list file
    Here, the first argument of dtcrop is the name of a volume list file, the second is the code word reorder.
    In this case, the volume list file contains all the cropping information in a text file: a list of tomogram files, and a list of models or tables to be used on each tomogram. For instance:
    dtcrop myfile.vll reorder myDataFolder 64
  2. A table and a volume table index file
    In this case, the table stores explicitely the coordinates of the particles (cols 24 to 26), and in column 20 it stores an integer. The volume table index file is a text file that contains on each row an index and a tomogram file name.
    dtcrop myMap.doc myTable.tbl myDataFolder 64

On each model separately

When you are manually creating a model in one of the browsers, the real workflow in an actual project involving many tomograms and modeles would be to just click the necessary points to define the user input of the model, store it in the catalogue, and move to the next model. Nevertheless, on the first model that you create with a set of parameters it is frequently useful to perform a particle extraction just to check that everything is doing what you think is doing.

If you are working in dtmview, dtmslice, Montage or any other tomogram browser connected with the model pool, you just need to select the model as the "actual" in the model pool menu tab (or right clicking on any point in the model), click on the update table points option and then on crop points on active model. This will bring you to the dtcrop GUI already pointing to the correct tomogram and to a temporary table containing the crop_points and crop_angles induces by your model.


Non-integer positions in tomogram

Tomogram positions that are not integer are rounded off before cropping. Internally, this means that columns 24 to 26 in the table are rounded with the Matlab command round. However, the created crop.tbl table will still contain the non-rounded double precision digits (in case the user needs them later for some reason).

The reason for this policy is to ensure that subtomograms are extracted from the tomogram as cubes of voxels as originally defined in the tomogram, without any interpolation due to non-integer centering.

This has no effect when running alignment or classification projects. However, this "mean half-pixel offset" needs to taken into account when creating graphical depictions using the table to place the averages back into the tomograms.