Difference between revisions of "Data folder"

From Dynamo
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
The data folder is the easiest way to store subtomograms.
+
The data folder is the easiest way to store subtomograms. Other formats are [[DBox folder | dBoxes]] or [[Particle List File | Particle List File]]
  
 
A correctly formated data folder is the minimal input that the [[dcp GUI]] needs in order to create an alignment project.
 
A correctly formated data folder is the minimal input that the [[dcp GUI]] needs in order to create an alignment project.
Line 19: Line 19:
 
==Creating data folders==
 
==Creating data folders==
 
Data folders with the correct format are automatically created when cropping particles out of tomograms with ''Dynamo''.
 
Data folders with the correct format are automatically created when cropping particles out of tomograms with ''Dynamo''.
 +
 +
=== Tutorial data folder ===
 +
An easy way to get started with the concept of the data folder is to create a synthetic one:
 +
<tt>dtutorial ttest </tt>
 +
will create the folder <tt>ttest/data</tt>, containing a set of 8 randomly oriented synthetic particles.
  
 
==Inspecting data folders==
 
==Inspecting data folders==
Line 30: Line 35:
 
{{docfunction|dynamo_data_check|ddcheck}} is slower but more exhaustive. It reads each particle in the [[data folder]] to check coherence in dimensions or presence of corrupted pixels (NaN values). It also computes the mean and std values of each particle.
 
{{docfunction|dynamo_data_check|ddcheck}} is slower but more exhaustive. It reads each particle in the [[data folder]] to check coherence in dimensions or presence of corrupted pixels (NaN values). It also computes the mean and std values of each particle.
  
=== Visualization of a individual particles ===
+
=== Visualization of individual particles ===
  
 
====  Easy GUI: <tt>ddbrowse</tt> ====
 
====  Easy GUI: <tt>ddbrowse</tt> ====
Line 44: Line 49:
 
=== Normalization ===
 
=== Normalization ===
 
Normalization is not stricitly necessary for operating an alignment, as ''Dynamo'' will normalize the particles internally. However, you might want to normalize your data for visualization purposes.
 
Normalization is not stricitly necessary for operating an alignment, as ''Dynamo'' will normalize the particles internally. However, you might want to normalize your data for visualization purposes.
 +
 +
=== Averaging ===
 +
 +
Averaging all the particles in a data folder is frequently a useful sanity check to check that the contents of the mulitiple, noisy looking files do contain signal. This is specially useful when some apriori knowledge on the particles is available. The basic syntax is:
 +
 +
<tt> daverage <mydata> -t <mytable> -ws o </tt>
 +
 +
which will save in the workspace variable <tt>o</tt> all the outputs of an average experiment.
  
 
== Fourier Mask on particles==
 
== Fourier Mask on particles==

Latest revision as of 16:48, 30 September 2019

The data folder is the easiest way to store subtomograms. Other formats are dBoxes or Particle List File

A correctly formated data folder is the minimal input that the dcp GUI needs in order to create an alignment project.

Basic Format

The most intuitive way of organizing your data is just to create one file for each particle inside the data folder. For instance, if you have 150 subtomograms, you can store them as


particle_00001.em
particle_00002.em
     ....
particle_00150.em

in a folder called arbitrarily myData. The name resulting folder myData can then be used as data folder parameter when creating alignment and classification projects,

Particle tags do not need to be sequential, but they do need to keep the same zero padding. You can use fomats mrc, em or spi. Bear in mind that this approach can break down when you have tens of thousands of particles. In those cases, you'll need to use more advanced data containers.

Creating data folders

Data folders with the correct format are automatically created when cropping particles out of tomograms with Dynamo.

Tutorial data folder

An easy way to get started with the concept of the data folder is to create a synthetic one:

dtutorial ttest 

will create the folder ttest/data, containing a set of 8 randomly oriented synthetic particles.

Inspecting data folders

Several tools are available for different check depths.

Quick check

ddinfo checks if the contents of a folder are files with the correct naming convention, and reports their number and expected sidelength.

Deep check

ddcheck is slower but more exhaustive. It reads each particle in the data folder to check coherence in dimensions or presence of corrupted pixels (NaN values). It also computes the mean and std values of each particle.

Visualization of individual particles

Easy GUI: ddbrowse

ddbrowse is an easy-to-use browser. Many Dynamo GUIs link to it, and can also be initiated through the command line. Passing a data folder and a table, one can see projections of a selected set of particles and check how they behave when they are aligned with the table. Secondary click on each particle will allow to visualize it through dview or dmapview or linked external tools like Chimera.

A complexer GUI: dgallery

dgallery allows keeping in memory large sets of particles. This allows for quick visualization of the same area of sets of particles very quickly. dgallery is commonly used for [manual alignment]

Typical operations on data folders

Dynamo provides tools for normalization of all the particles inside the data folder, merging different data folders (with their corresponding metadata), converting data folders to other data containing formats.

Normalization

Normalization is not stricitly necessary for operating an alignment, as Dynamo will normalize the particles internally. However, you might want to normalize your data for visualization purposes.

Averaging

Averaging all the particles in a data folder is frequently a useful sanity check to check that the contents of the mulitiple, noisy looking files do contain signal. This is specially useful when some apriori knowledge on the particles is available. The basic syntax is:

 daverage <mydata> -t <mytable> -ws o 

which will save in the workspace variable o all the outputs of an average experiment.

Fourier Mask on particles

A data folder can also store an individual fourier sampling type for each particle. You can store a set of files for the so-called pfmask (particle fourier mask) of each particle in the same folder.


pfmask_00001.em
pfmask_00002.em
     ....
pfmask_00150.em

This option is used when your table is marking a particle with fourier type 5 (in column 13).