Difference between revisions of "Particle extraction"

From Dynamo
Jump to navigation Jump to search
 
(22 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
Particle extraction is the process of using a set of positions (and possibly orientations) defined on a set of tomograms to create a set of subtomograms files.
 
Particle extraction is the process of using a set of positions (and possibly orientations) defined on a set of tomograms to create a set of subtomograms files.
 
This set is represented by a  [[data folder]] containing one file for each cropped subtomogram, and possibly a  [[table]] retaining the metadata. Data folder and table can directly be used for [[subtomogram alignment]]
 
This set is represented by a  [[data folder]] containing one file for each cropped subtomogram, and possibly a  [[table]] retaining the metadata. Data folder and table can directly be used for [[subtomogram alignment]]
 +
 +
 +
==Results of a particle extraction==
 +
Users might use their own protocols (as extracting particles separately in different experiments and merging the sets afterwards). ''Dynamo'' uses a single convention for the results of an extraction:
 +
 +
=== Data folder ===
 +
Particle extraction should produce a folder with all the particles. For very large particle sets (~>20K), this folder can be formatted as a [[dBox folder]]. In general the particles are named with [[Particle|tags]] starting with 1, but some integers might be missing: they correspond to particles that were not generated, because they were too close to the boundaries of the tomogram and cropping them would implying reading out of scope.
 +
 +
The particles in this data folder are never rotated or shifted. Any rotation or shift is coded into the resulting [[table]].
 +
 +
=== Table ===
 +
A ''single'' [[table]] is produced to index all the particles in the [[data folder]]. This table refers only to the particles actually in the data folder, i.e., the tags (column 1 in table) will only include the ones that correspond to an particle file actually available in the generated data folder. This table is stored inside the data folder with the name {{t|crop.tbl}}.
 +
 +
=== Volume table index ===
 +
The [[volume table index file]] is generated when several tomograms have been used. It is generated inside the data folder with the name {{t|indices_column20.doc}} and its a text file with two columns. Each row is an integer number and a filename. The numbers are the ''tomogram number'' used in the column 20 of  the [[#Table|produced table]] {{t|crop.tbl}}. This file, jointly with the table, makes possible to track from which tomogram a particle has been cropped.
  
 
==Using the catalogue==
 
==Using the catalogue==
  
Select the tomograms in which you have models containing particles and click on the menu tab ''Crop particles''. A new GUI will popup, with a list of models. You can then select the models that already contain ''crop points''). Then click on ''Create list'' and ''-> Crop particles''
+
Select the tomograms in which you have models containing particles and click on the menu tab ''Crop particles''. A new GUI will popup, with a list of models. You can then select the models that already contain ''crop points''). Then click on ''Create list''. This will show you an automatically created [[volume list file]]. This is just for information, no need to do anything on it. Click then on ''-> Crop particles'', getting a GUI where you can select the sidelength in pixels, and start the actual cropping.
 +
 
 +
It will produce a [[data folder]] which will contain the cropped particles and some additional files:
 +
* a [[table]] file called <tt>crop.tbl</tt>.
 +
*: Column 20 assigns the particle to a particular ''tomogram'' in the [[volume list]] file.
 +
*: Column 21 assigns the particle to a particular ''model'' in the [[volume list]] file.
 +
* a file called [[Tomogram-table map file | <tt>‎‎indices_column20.doc</tt>]] doc file with the full name of each tomogram coded by an integer in column 20 of <tt>crop.tbl</tt>
 +
* a file called <tt>‎‎indices_column21.doc</tt> with the full name of each model coded by an integer in column 21 of <tt>crop.tbl</tt>
 +
 
 +
===Extract from different tomograms ===
 +
You can use the coordinates picked in a set of tomogram to extract particles on a different set of tomograms. This can be done automatically using the  [[Viewing tomograms#Explicite volume pairs | catalogue]] to explicite pairs of tomograms, or you can do it manually using [[dtcrop]]:
 +
 
 +
<tt> dtcrop <tomogram_table_map.doc> <tableForAllTomograms>  <outputfolder> <sidelength> </tt>
 +
 
 +
You could create a table with the previous tomograms in the [[tomogram-table map file]] , and then edit the names of the tomogram files. If you use this option, the number that identifies the tomogram in <tt>tomogram_table_map.doc</tt> should match column 20 of the table.
 +
If you have a table for each tomogram, you can use <tt>dtmerge</tt> to create a single one, for instance:
 +
<tt>tableGlobal = dtmerge(cellOfTables,'linear_tags',1); </tt>
 +
Here, <tt>cellOfTables</tt> should be a cell array {} containing a table at each entry.
  
 
==From the command line==
 
==From the command line==
 +
In order to have a direct insight on how the cropping is performed, you can ignore the catalogue framework and just use the command line order <tt>dtcrop</tt> .
 +
Help on command:{{docfunction|dynamo_table_crop|dtcrop}}
 +
 +
For single tomograms, all you need is [[cropping table|table]]. When cropping simultaneously from different tomograms into the same data folder, there are two options:
 +
# Using a ''volume list'' file
 +
#: Here, the first argument of <tt>dtcrop</tt> is the name of a [[volume list file]], the second is the code word <tt>reorder</tt>.
 +
#: In this case, the volume list file contains all the cropping information in a text file: a list of tomogram files, and a list of models or tables to be used on each tomogram. For instance:
 +
#: <tt>dtcrop myfile.vll reorder myDataFolder 64</tt>
 +
# A table  and a [[volume table index file]]
 +
#:  In this case, the table stores explicitely the coordinates of the particles (cols 24 to 26), and in column 20 it stores an integer. The [[volume table index file]] is a text file that contains on each row an index and a tomogram file name.
 +
#: <tt>dtcrop myMap.doc myTable.tbl myDataFolder 64</tt>
  
You might just use the command <tt>dtcrop</tt>.
+
==On each model separately==
  
==On a single tomogram==
+
When you are manually creating a model in one of the browsers, the real workflow in an actual project involving many tomograms and modeles would be to just click the necessary points to define the user input of the model, store it in the catalogue, and move to the next model. Nevertheless, on the first model that you create with a set of parameters it is frequently useful to perform a particle extraction just to check that everything is doing what you think is doing.
  
==On a single tomogram==
+
If you are working in <tt>dtmview</tt>, <tt>dtmslice</tt>, <tt>Montage</tt> or any other tomogram browser connected with the [[model pool]], you just need to select the model as the "actual"  in the [[model pool]] menu tab (or right clicking on any point in the model), click on the <tt>update table points</tt> option and then on <tt> crop points on active model</tt>. This will bring you to the [[dtcrop GUI]] already pointing to the correct tomogram and to a temporary  [[table]] containing the crop_points and <tt>crop_angles</tt> induces by your model.
  
==On each model separately==
 
  
When you are manually creating a model in one of the browsers, the real workflow in an actual project involving many tomograms and modeles would be to just click the necessary points to define the user input of the model, store it in the catalogue, and move to the next model.
+
== Non-integer positions in tomogram ==
 +
 
 +
Tomogram positions in a [[cropping table|table]] are indicated by the addition of columns 4 to 6 (shifts from physical [[Volume center|subtomogram center]]) and 24 to 26 (position of the physical center of the subtomogram).  These numbers are defined with respect to an origin (0,0,0) located on the bottom corner of the voxel of the tomogram pixel indexed as [1,1,1] (and centered at coordinate (0.5,0.5,0.5) in this system of coordinates ).
 +
 
 +
This convention, explained  [[Volume_center#While_particle_cropping| here]], ensures that particles are cropped ''without any interpolation'' from the original tomogram. The intensities of tomogram voxels will fill verbatim the voxels of the cropped particle. If the center of the central voxel of a particle does not correspond to the center of a voxel of a tomogram, the general policy in ''Dynamo'' is to register the distance between the two centers and store it as a "shift" in columns 4 to 6 of a table.

Latest revision as of 15:46, 24 April 2018

Particle extraction is the process of using a set of positions (and possibly orientations) defined on a set of tomograms to create a set of subtomograms files. This set is represented by a data folder containing one file for each cropped subtomogram, and possibly a table retaining the metadata. Data folder and table can directly be used for subtomogram alignment


Results of a particle extraction

Users might use their own protocols (as extracting particles separately in different experiments and merging the sets afterwards). Dynamo uses a single convention for the results of an extraction:

Data folder

Particle extraction should produce a folder with all the particles. For very large particle sets (~>20K), this folder can be formatted as a dBox folder. In general the particles are named with tags starting with 1, but some integers might be missing: they correspond to particles that were not generated, because they were too close to the boundaries of the tomogram and cropping them would implying reading out of scope.

The particles in this data folder are never rotated or shifted. Any rotation or shift is coded into the resulting table.

Table

A single table is produced to index all the particles in the data folder. This table refers only to the particles actually in the data folder, i.e., the tags (column 1 in table) will only include the ones that correspond to an particle file actually available in the generated data folder. This table is stored inside the data folder with the name crop.tbl.

Volume table index

The volume table index file is generated when several tomograms have been used. It is generated inside the data folder with the name indices_column20.doc and its a text file with two columns. Each row is an integer number and a filename. The numbers are the tomogram number used in the column 20 of the produced table crop.tbl. This file, jointly with the table, makes possible to track from which tomogram a particle has been cropped.

Using the catalogue

Select the tomograms in which you have models containing particles and click on the menu tab Crop particles. A new GUI will popup, with a list of models. You can then select the models that already contain crop points). Then click on Create list. This will show you an automatically created volume list file. This is just for information, no need to do anything on it. Click then on -> Crop particles, getting a GUI where you can select the sidelength in pixels, and start the actual cropping.

It will produce a data folder which will contain the cropped particles and some additional files:

  • a table file called crop.tbl.
    Column 20 assigns the particle to a particular tomogram in the volume list file.
    Column 21 assigns the particle to a particular model in the volume list file.
  • a file called ‎‎indices_column20.doc doc file with the full name of each tomogram coded by an integer in column 20 of crop.tbl
  • a file called ‎‎indices_column21.doc with the full name of each model coded by an integer in column 21 of crop.tbl

Extract from different tomograms

You can use the coordinates picked in a set of tomogram to extract particles on a different set of tomograms. This can be done automatically using the catalogue to explicite pairs of tomograms, or you can do it manually using dtcrop:

 dtcrop <tomogram_table_map.doc> <tableForAllTomograms>  <outputfolder> <sidelength> 

You could create a table with the previous tomograms in the tomogram-table map file , and then edit the names of the tomogram files. If you use this option, the number that identifies the tomogram in tomogram_table_map.doc should match column 20 of the table. If you have a table for each tomogram, you can use dtmerge to create a single one, for instance:

tableGlobal = dtmerge(cellOfTables,'linear_tags',1); 

Here, cellOfTables should be a cell array {} containing a table at each entry.

From the command line

In order to have a direct insight on how the cropping is performed, you can ignore the catalogue framework and just use the command line order dtcrop . Help on command:dtcrop

For single tomograms, all you need is table. When cropping simultaneously from different tomograms into the same data folder, there are two options:

  1. Using a volume list file
    Here, the first argument of dtcrop is the name of a volume list file, the second is the code word reorder.
    In this case, the volume list file contains all the cropping information in a text file: a list of tomogram files, and a list of models or tables to be used on each tomogram. For instance:
    dtcrop myfile.vll reorder myDataFolder 64
  2. A table and a volume table index file
    In this case, the table stores explicitely the coordinates of the particles (cols 24 to 26), and in column 20 it stores an integer. The volume table index file is a text file that contains on each row an index and a tomogram file name.
    dtcrop myMap.doc myTable.tbl myDataFolder 64

On each model separately

When you are manually creating a model in one of the browsers, the real workflow in an actual project involving many tomograms and modeles would be to just click the necessary points to define the user input of the model, store it in the catalogue, and move to the next model. Nevertheless, on the first model that you create with a set of parameters it is frequently useful to perform a particle extraction just to check that everything is doing what you think is doing.

If you are working in dtmview, dtmslice, Montage or any other tomogram browser connected with the model pool, you just need to select the model as the "actual" in the model pool menu tab (or right clicking on any point in the model), click on the update table points option and then on crop points on active model. This will bring you to the dtcrop GUI already pointing to the correct tomogram and to a temporary table containing the crop_points and crop_angles induces by your model.


Non-integer positions in tomogram

Tomogram positions in a table are indicated by the addition of columns 4 to 6 (shifts from physical subtomogram center) and 24 to 26 (position of the physical center of the subtomogram). These numbers are defined with respect to an origin (0,0,0) located on the bottom corner of the voxel of the tomogram pixel indexed as [1,1,1] (and centered at coordinate (0.5,0.5,0.5) in this system of coordinates ).

This convention, explained here, ensures that particles are cropped without any interpolation from the original tomogram. The intensities of tomogram voxels will fill verbatim the voxels of the cropped particle. If the center of the central voxel of a particle does not correspond to the center of a voxel of a tomogram, the general policy in Dynamo is to register the distance between the two centers and store it as a "shift" in columns 4 to 6 of a table.