Difference between revisions of "Data folder"
(20 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | The data folder is the easiest way to store subtomograms. | + | The data folder is the easiest way to store subtomograms. Other formats are [[DBox folder | dBoxes]] or [[Particle List File | Particle List File]] |
− | A correctly formated data folder is the minimal input that the | + | A correctly formated data folder is the minimal input that the [[dcp GUI]] needs in order to create an alignment project. |
+ | ==Basic Format== | ||
− | + | The most intuitive way of organizing your data is just to create one file for each particle inside the data folder. For instance, if you have 150 subtomograms, you can store them as | |
+ | <tt> | ||
+ | particle_00001.em | ||
+ | particle_00002.em | ||
+ | .... | ||
+ | particle_00150.em | ||
+ | </tt> | ||
+ | in a folder called arbitrarily {{t|myData}}. The name resulting folder {{t|myData}} can then be used as {{t|data folder}} parameter when creating alignment and classification projects, | ||
− | + | Particle tags do not need to be sequential, but they do need to keep the same zero padding. You can use fomats {{t|mrc}}, {{t|em}} or {{t|spi}}. | |
− | Bear in mind that this approach can break down when you have tens of thousands of particles. In those cases, you'll need to use more advanced data containers. | + | Bear in mind that this approach can break down when you have tens of thousands of particles. In those cases, you'll need to use more [[generic data containers|advanced data containers]]. |
==Creating data folders== | ==Creating data folders== | ||
Data folders with the correct format are automatically created when cropping particles out of tomograms with ''Dynamo''. | Data folders with the correct format are automatically created when cropping particles out of tomograms with ''Dynamo''. | ||
+ | |||
+ | === Tutorial data folder === | ||
+ | An easy way to get started with the concept of the data folder is to create a synthetic one: | ||
+ | <tt>dtutorial ttest </tt> | ||
+ | will create the folder <tt>ttest/data</tt>, containing a set of 8 randomly oriented synthetic particles. | ||
==Inspecting data folders== | ==Inspecting data folders== | ||
− | Several tools are available for different | + | Several tools are available for different check depths. |
=== Quick check === | === Quick check === | ||
− | + | {{docfunction|dynamo_data_info|ddinfo}} checks if the contents of a folder are files with the correct naming convention, and reports their number and expected sidelength. | |
=== Deep check === | === Deep check === | ||
− | + | {{docfunction|dynamo_data_check|ddcheck}} is slower but more exhaustive. It reads each particle in the [[data folder]] to check coherence in dimensions or presence of corrupted pixels (NaN values). It also computes the mean and std values of each particle. | |
− | === Visualization of | + | === Visualization of individual particles === |
==== Easy GUI: <tt>ddbrowse</tt> ==== | ==== Easy GUI: <tt>ddbrowse</tt> ==== | ||
− | + | {{docfunction|dynamo_data_browse| ddbrowse}} is an easy-to-use browser. Many ''Dynamo'' GUIs link to it, and can also be initiated through the command line. Passing a [[data folder]] and a [[table]], one can see projections of a selected set of particles and check how they behave when they are aligned with the table. Secondary click on each particle will allow to visualize it through [[dview]] or [[dmapview]] or linked external tools like Chimera. | |
− | |||
==== A complexer GUI: dgallery ==== | ==== A complexer GUI: dgallery ==== | ||
− | + | {{docfunction|dynamo_gallery| dgallery}} allows keeping in memory large sets of particles. This allows for quick visualization of the same area of sets of particles very quickly. | |
<tt> dgallery </tt> is commonly used for [manual alignment] | <tt> dgallery </tt> is commonly used for [manual alignment] | ||
+ | |||
+ | == Typical operations on data folders == | ||
+ | ''Dynamo'' provides tools for normalization of all the particles inside the data folder, merging different data folders (with their corresponding metadata), converting data folders to other data containing formats. | ||
+ | |||
+ | === Normalization === | ||
+ | Normalization is not stricitly necessary for operating an alignment, as ''Dynamo'' will normalize the particles internally. However, you might want to normalize your data for visualization purposes. | ||
+ | |||
+ | === Averaging === | ||
+ | |||
+ | Averaging all the particles in a data folder is frequently a useful sanity check to check that the contents of the mulitiple, noisy looking files do contain signal. This is specially useful when some apriori knowledge on the particles is available. The basic syntax is: | ||
+ | |||
+ | <tt> daverage <mydata> -t <mytable> -ws o </tt> | ||
+ | |||
+ | which will save in the workspace variable <tt>o</tt> all the outputs of an average experiment. | ||
+ | |||
+ | == Fourier Mask on particles== | ||
+ | A data folder can also store an individual ''fourier sampling'' type for each particle. You can store a set of files for the so-called {{t|pfmask}} (particle fourier mask) of each particle in the same folder. | ||
+ | <tt> | ||
+ | pfmask_00001.em | ||
+ | pfmask_00002.em | ||
+ | .... | ||
+ | pfmask_00150.em | ||
+ | </tt> | ||
+ | This option is used when your table is marking a particle with fourier type 5 (in column 13). |
Latest revision as of 16:48, 30 September 2019
The data folder is the easiest way to store subtomograms. Other formats are dBoxes or Particle List File
A correctly formated data folder is the minimal input that the dcp GUI needs in order to create an alignment project.
Contents
Basic Format
The most intuitive way of organizing your data is just to create one file for each particle inside the data folder. For instance, if you have 150 subtomograms, you can store them as
particle_00001.em particle_00002.em .... particle_00150.em
in a folder called arbitrarily myData. The name resulting folder myData can then be used as data folder parameter when creating alignment and classification projects,
Particle tags do not need to be sequential, but they do need to keep the same zero padding. You can use fomats mrc, em or spi. Bear in mind that this approach can break down when you have tens of thousands of particles. In those cases, you'll need to use more advanced data containers.
Creating data folders
Data folders with the correct format are automatically created when cropping particles out of tomograms with Dynamo.
Tutorial data folder
An easy way to get started with the concept of the data folder is to create a synthetic one:
dtutorial ttest
will create the folder ttest/data, containing a set of 8 randomly oriented synthetic particles.
Inspecting data folders
Several tools are available for different check depths.
Quick check
ddinfo checks if the contents of a folder are files with the correct naming convention, and reports their number and expected sidelength.
Deep check
ddcheck is slower but more exhaustive. It reads each particle in the data folder to check coherence in dimensions or presence of corrupted pixels (NaN values). It also computes the mean and std values of each particle.
Visualization of individual particles
Easy GUI: ddbrowse
ddbrowse is an easy-to-use browser. Many Dynamo GUIs link to it, and can also be initiated through the command line. Passing a data folder and a table, one can see projections of a selected set of particles and check how they behave when they are aligned with the table. Secondary click on each particle will allow to visualize it through dview or dmapview or linked external tools like Chimera.
A complexer GUI: dgallery
dgallery allows keeping in memory large sets of particles. This allows for quick visualization of the same area of sets of particles very quickly. dgallery is commonly used for [manual alignment]
Typical operations on data folders
Dynamo provides tools for normalization of all the particles inside the data folder, merging different data folders (with their corresponding metadata), converting data folders to other data containing formats.
Normalization
Normalization is not stricitly necessary for operating an alignment, as Dynamo will normalize the particles internally. However, you might want to normalize your data for visualization purposes.
Averaging
Averaging all the particles in a data folder is frequently a useful sanity check to check that the contents of the mulitiple, noisy looking files do contain signal. This is specially useful when some apriori knowledge on the particles is available. The basic syntax is:
daverage <mydata> -t <mytable> -ws o
which will save in the workspace variable o all the outputs of an average experiment.
Fourier Mask on particles
A data folder can also store an individual fourier sampling type for each particle. You can store a set of files for the so-called pfmask (particle fourier mask) of each particle in the same folder.
pfmask_00001.em pfmask_00002.em .... pfmask_00150.em
This option is used when your table is marking a particle with fourier type 5 (in column 13).